LC_COLLATE Collating Keywords

The LC_COLLATE category definition in a locale source file establishes the relative order between collating elements in the locale that is compiled from that source by the LOCALDEF utility. The LC_COLLATE keywords establish a collation sequence that assigns each element one or more collation values.

The following keywords are recognized in a collation sequence definition:

  • copy
  • collating-element
  • collating symbol
  • substitute
  • order-start
  • order-end
  • copy
    Specifies the name of an existing locale to be used as the source for the definition of this category. If this keyword is specified, no other keyword shall be present in this category. If the locale is not found, an error is reported and no locale output is created. The copy keyword cannot specify a locale that also specifies the copy keyword for the same category.

    collating-element
    Defines a collating-element symbol representing a multicharacter collating element. This keyword is optional.

    In addition to the collating elements in the character set, the collating-element keyword can be used to define multicharacter collating elements. The syntax is:

      "collating-element %s from %s\n", <collating-element>, <string>

    The <collating-element> should be a symbolic name enclosed between angle brackets (< and >), and should not duplicate any symbolic name in the current charmap file (if any), or any other symbolic name defined in this collation definition. The string operand is a string of two or more characters that collate as an entity. A <collating-element> defined with this keyword is only recognized within the LC_COLLATE category.

    For example:

    collating-element <ch> from "<c><h>"
    collating-element <e-acute> from "<acute><e>"
    collating-element <ll> from "ll"

    collating-symbol
    Defines a collating symbol for use in collation order statements.

    The collating-symbol keyword defines a symbolic name that can be associated with a relative position in the character order sequence. While such a symbolic name does not represent any collating element, it can be used as a weight. This keyword is optional.

    This construct can define symbols for use in collation sequence statements, between the order_start and order_end keywords.

    The syntax is:

      "collating-symbol %s\n", <collating-symbol>

    The <collating-symbol> must be a symbolic name, enclosed between angle brackets (< and >), and should not duplicate any symbolic name in the current charmap file (if any), or any other symbolic name defined in this collation definition. A <collating-symbol> defined with this keyword is only recognized within the LC_COLLATE category.

    For example:

    collating-symbol <UPPER_CASE>
    collating-symbol <HIGH>

    substitute
    Defines a substring substitution in a string to be collated. This keyword is optional. The following operands are supported with the substitute keyword:

      "substitute %s with %s\n", <regular-expr>,<replacement>

    The first operand is treated as a basic regular expression. The replacement operand consists of zero or more characters and regular expression back-references (for example, \1 through \9). The back-references consist of the backslash followed by a digit from 1 to 9. If the backslash is followed by two or three digits, it is interpreted as an octal constant.

    When strings are collated according to a collation definition containing substitute statements, the collation behaves as if occurrences of substrings matching the basic regular expression are replaced by the replacement string, before the strings are compared based on the specified collation sequence. Ranges in the regular expression are interpreted according to the current character collation sequence and character classes according to the character classification specified by the LC_CTYPE environment variable at collation time. If more than one substitute statement is present in the collation definition, the collation process behaves as if the substitute statements are applied to the strings in the order they occur in the source definition. The substitution for the substitute statements are processed before any substitutions for one-to-many mappings.

    The support of the substitute keyword is an IBM VisualAge for C++ extension to the POSIX standard.

    order_start
    Define collating rules. This statement is followed by one or more collation order statements, assigning character collation values and collation weights to collating elements.

    The order_start keyword must precede collation order entries. It defines the number of weights for this collation sequence definition and other collation rules.

    The syntax of the order_start keyword is:

      order_start <sort-rule1>;<sort-rule2>;...;<sort-rulen>

    The operands of the order_start keyword are optional. If present, the operands define rules to be applied when strings are compared. The number of operands define how many weights each element is assigned; if no operands are present, one forward operand is assumed. If any is present, the first operand defines rules to be applied when comparing strings using the first (primary) weight; the second when comparing strings using the second weight, and so on. Operands are separated by semicolons (;). Each operand consists of one or more collation directives separated by commas (,). If the number of operands exceeds the limit of 6, the LOCALDEF utility issues a warning message.

    The order-start keyword supports the following directives:

    order_end
    Terminates the collating order entries.

    Example: LC_COLLATE Locale Category Definition



    Internationalization
    Localization and Locales


    Customize a Locale


    LC_COLLATE Category
    LC_COLLATE Collating Rules
    Locale Categories
    Locale Source Files