Roger (and everyone) I agree with your description of keywords as trees, and that they can therefore be described as instances of SGML DTDs. So let us see where we can go from that analysis. To be more specific, all the keywords to be sorted are instances of one DTD. You then therefore ask: "How are structured documents sorted?" To which the answer is that, in the normal everyday sense of users of structured documents, collections of structured documents are not sorted; this is of course not strictly true, but when they are sorted (eg in a catalogue of documents) only a small part (or none) of the full document is needed. But we need to sort keywords. However, it does not necessarily follow that we need a theory of sorting structured documents. There are two ways in which our problem may differ from this general theory: 1. We may not need to deal with any possible DTD. SGML allows very complex DTDs and it could be that the DTDs needed for the structure of keywords form a subclass. This could make our problem easier. 2. We are not (at least to begin with) interested in general orderings of keywords but only in ones that make sense for a "human-usable index". To illustrate what I mean here, suppose that we were not aiming at "human-usability" but were only ordering for machine readability: we are then in familiar territory where there is a well-developed theory (hash-tables etc) but it is useless for our purposes. This could make the problem easier but I suspect that it makes it harder: for a start we need to know what "human-usable" means for an index (and, of course, eventually we need to consider the usability of xindy to produce such an index). Thus, whilst you have not answered my questions, you have convinced me that they are still worth asking even if we need to ask more general questions later. But maybe they should be put into the context of first deciding what sort of orderings are useful for human-readable indexes? Maybe I am saying that we need to address exactly what you find hard: > In fact, to me it seems to be hard to find semi-solutions to this > problem that address only some of the problems but not all of them. and I am suggesting one way (I hope there are others) to start down this hard road. I think that you will have to expand this bit before I can comment further: > Another aspect is that the sorting process may be structured > differently. It can be described in terms of an acyclic graph having > as is edges the specification of the sorting process that has to be > applied. > > lang=chinese > o-------------->o-----+----->o strokes=1 > ! +----->o strokes=2 > ! lang=others +... > +-------------->o > .... > > This is an idea that came into my mind just when I was typing. I'm not > yet sure about this aspect. But it may be a natural way of defining > sorting processes and reusing paths in the graph, which seems to be > quite useful in practice. Here another problem occurs. Are > categories or > enumerations of attributes still useful as it was introduced in the > define-letter declaration? it sounds very interesting but I am unsure what are the constituents of your acyclic graphs. I had thought about bringing in the sorting of ideographs (there are various methods) but I decided to stick with alphabetical words since I have a better intuition as to what is needed in practice for these. It would certainly be useful to ask people who know about non-alphabetic writing systems whether the methods for sorting them do introduce yet more concepts that we have not considered: I am not sure whether I hope they do or hope they do not:-). Is there anyone out there listening who can help us here? Best wishes chrisRoger and everyone, I agree with your description of keywords as trees, and that they can therefore be described as instances of SGML DTDs. So let us see where we can go from that analysis. To be more specific, all the keywords to be sorted are instances of one DTD. You then therefore ask: "How are structured documents sorted?" To which the answer is that, in the normal everyday sense of users of structured documents, collections of structured documents are not sorted; this is of course not strictly true, but when they are sorted (eg in a catalogue of documents) only a small part (or none) of the full document is needed. But we need to sort keywords. However, it does not necessarily follow that we need a theory of sorting structured documents. There are two ways in which our problem may differ from this general theory: 1. We may not need to deal with any possible DTD. SGML allows very complex DTDs and it could be that the DTDs needed for the structure of keywords form a subclass. This could make our problem easier. 2. We are not (at least to begin with) interested in general orderings of keywords but only in ones that make sense for a "human-usable index". To illustrate what I mean here, suppose that we were not aiming at "human-usability" but were only ordering for machine readability: we are then in familiar territory where there is a well-developed theory (hash-tables etc) but it is useless for our purposes. This could make the problem easier but I suspect that it makes it harder: for a start we need to know what "human-usable" means for an index (and, of course, eventually we need to consider the usability of xindy to produce such an index). Thus, whilst you have not answered my questions, you have convinced me that they are still worth asking even if we need to ask more general questions later. But maybe they should be put into the context of first deciding what sort of orderings are useful for human-readable indexes? Maybe I am saying that we need to address exactly what you find hard: > In fact, to me it seems to be hard to find semi-solutions to this > problem that address only some of the problems but not all of them. and I am suggesting one way (I hope there are others) to start down this hard road. I think that you will have to expand this bit before I can comment further: > Another aspect is that the sorting process may be structured > differently. It can be described in terms of an acyclic graph having > as is edges the specification of the sorting process that has to be > applied. > > lang=chinese > o-------------->o-----+----->o strokes=1 > ! +----->o strokes=2 > ! lang=others +... > +-------------->o > .... > > This is an idea that came into my mind just when I was typing. I'm not > yet sure about this aspect. But it may be a natural way of defining > sorting processes and reusing paths in the graph, which seems to be > quite useful in practice. Here another problem occurs. Are > categories or > enumerations of attributes still useful as it was introduced in the > define-letter declaration? it sounds very interesting but I am unsure what are the constituents of your acyclic graphs. I had thought about bringing in the sorting of ideographs (there are various methods) but I decided to stick with alphabetical words since I have a better intuition as to what is needed in practice for these. It would certainly be useful to ask people who know about non-alphabetic writing systems whether the methods for sorting them do introduce yet more concepts that we have not considered: I am not sure whether I hope they do or hope they do not:-). Is there anyone out there listening who can help us here? Best wishes chris