The LC_COLLATE category definition in a locale source file establishes the relative order between collating elements in the locale that is compiled from that source by the LOCALDEF utility.
The collation sequence definition is used by regular expressions, pattern matching, and sorting and collating functions.
Collating rules consist of an list of collating order statements in the LC_COLLATE category definition. These statements are ordered from lowest to highest, so that the first to appear in the source specifies the identifier of the element to be collated first.
The <NUL> character is considered lower than any other character. The ellipsis symbol ("...") is a special collation order statement. It specifies that a sequence of characters collate according to their encoded character values. It causes all characters with values higher than the value of the <collating identifier> in the preceding line, and lower than the value for the <collating identifier> on the following line, to be placed in the following collation order statements in ascending order according to their encoded character values.
The use of the ellipsis symbol ties the definition to a specific coded character set and may preclude the definition from being portable among implementations.
The ellipsis symbol must be on a line by itself, not the first or last line, and the preceding and succeeding lines must not specify a weight.
A collating order statement describes how a collating identifier is weighted.
Each <collating-identifier> consists of a character, <collating-element>, <collating-symbol>, or the special symbol UNDEFINED. The order in which collating elements are specified determines the character order sequence, such that each collating element is considered lower than the elements following it. The <NUL> character is considered lower than any other character. Weights are expressed as characters, <collating-symbol>s, <collating-element>s, or the special symbol IGNORE. A single character, a <collating-symbol>, or a <collating-element> represents the relative position in the character collating sequence of the character or symbol, rather than the character or characters themselves. Thus rather than assigning absolute values to weights, a particular weight is expressed using the relative "order value" assigned to a collating element based on its order in the character collation sequence.
A <collating-element> specifies multicharacter collating elements, and indicates that the character sequence specified by the <collating-element> is to be collated as a unit and in the relative order specified by its place.
A <collating-symbol> can define a position in the relative order for use in weights. Do not use a <collating-symbol> to specify a weight.
The <collating-symbol> UNDEFINED is interpreted as including all characters not specified explicitly. Such characters are inserted in the character collation order at the point indicated by the symbol, and in ascending order according to their encoded character values. If no UNDEFINED symbol is specified, and the current coded character set contains characters not specified in this clause, the LOCALDEF utility issues a warning and places such characters at the end of the character collation order.
The syntax for a collation order statement is:
<collating-identifier> <weight1>;<weight2>;...;<weightn>
Collation of two collating identifiers is done by comparing their relative primary weights. This process is repeated for successive weight levels until the two identifiers are different, or the weight levels are exhausted. The operands for each collating identifier define the primary, secondary, and subsequent relative weights for the collating identifier. Two or more collating elements can be assigned the same weight. If two collating identifiers have the same primary weight, they belong to the same equivalence class.
The special symbol IGNORE as a weight indicates that when strings are compared using the weights at the level where IGNORE is specified, the collating element should be ignored, as if the string did not contain the collating element. In regular expressions and pattern matching, all characters that are IGNOREd in their primary weight form an equivalence class.
All characters specified by an ellipsis are assigned unique weights, equal to the relative order of the characters. Characters specified by an explicit or implicit UNDEFINED special symbol are assigned the same primary weight (they belong to the same equivalence class).
One-to-many mapping is indicated by specifying two or more concatenated characters or symbolic names. For example, if the character "<ezset>" is given the string "<s><s>" as a weight, comparisons are performed as if all occurrences of the character <ezset> are replaced by <s><s> (assuming <s> has the collating weight <s>). If it is desirable to define <ezset> and <s><s> as an equivalence class, then a collating element must be defined for the string "ss".
If no weight is specified, the collating identifier is interpreted as itself.
| For example, | the order statement | <a> <a> |
| is equivalent to | <a> |
Example: LC_COLLATE Locale Category Definition
![]()
Internationalization
Localization
and Locales
![]()
LC_COLLATE Locale
Category
LC_COLLATE
Collating Keywords
Locale
Categories
Locale Source
Files