The locale approach can be used to describe the sorting schemes
of may languages. Though, using it in conjunction with xindy has
several disadvantages.
locale
into xindy. But defining a new set of rules would have a significant
overhead. A new table must first be created and compiled with
localedef into a form readable by the library. This is in
contrast to the dynamic paradigm of xindy and therefore in my
opinion not feasible.
LC_COLLATE is rather
complex and in no way declarative which initially was a design
goal of xindy Additionally, it solves the problem of pure alphabets
but dealing with markup in the keyword is still an open issue. The
markup can often not be pre-processed since the markup may only play a
rule in one of the later sorting runs. It actually depends on the
user's needs.
Based on these observations my current proposal is a mixture of the
locale approach and the current implementation in xindy.
sort-rule can be extended
with an additional argument :level taking the number of the level
into which the sort rule is to be put. Additionally there must be a
specification on how each run is to be sorted (forward, backward).
These rules may still contain regular expression substitutions which
may come into consideration at any level as necessary.
I'll give an example of how powerful this approach can be:
Assuming we have the following keywords to sort:
\tt{ARM}
\it{arm}
Arm
arm
Armbrust
armselig
Taking into consideration that we want to sort case-independent at the first level of comparision this can be done with the following rule set:
A -> a
\tt{(.*)} -> \1 :again
\it{(.*)} -> \1 :again
This obtains the following result:
Arm, arm, \it{arm}, \tt{ARM}
ArmbrustarmseligThe intended sorting rule says that the keywords containing markup should come before the others. Thus we must define a rule set expressing this sort order:
\tt{(.*)} -> 0\1 :again
\it{(.*)} -> 1\1 :again
A -> 2a
a -> 2a
Now we have prefixed the letters to obtain a further relative sorting order:
\tt{ARM}
\it{ARM}
Arm, armThe last step is now to obtain a total order. We do not specify any other rules. since we sort according to the position in the ISO Latin alphabet with A being before a obtaining
ArmarmThus, we have gradually refined the partial order into an total one.
The advantage is that we are still able to use declarative
descriptions such as
\tt{(.*)} -> \1
to match a many keywords
at once.
I have several questions about this scheme and I'm interested in your opinion.
2arm stuff for example which is actually only a work-around
to re-incorporate tokens in a very uncleanly manner). But is it
necessary to include tokens?