The Unicode class provides the property information for a Unicode character. The Unicode character information, provided implicitly by the Unicode character encoding standard, includes information about the sript (for example, symbols or control characters) to which the character belongs, as well as semantic information such as whether a character is a digit or uppercased, lowercased, or uncased. The current implementation is based on the Unicode Standard 2.0.14.
Do not derive your own classes from this class.
Getting Character Property InformationUse the functions in this group to get the Unicode character's script, character property, and direction property information.
![]() |
public:
static EDirectionProperty characterDirection(UniChar uc)
For example, 0x0041 (letter A) has the kLeftToRight directional property.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static EUnicodeScript script(UniChar uc)
For example, 0x03A9 (Omega) is of the Greek script.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static ECharacterProperty type(UniChar uc)
For example, 0x0030 (Zero) has kDecimalNumber (Numeric) type.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Getting the Unicode VersionUse the function in this group to identify the version of the Unicode standard used for this implementation.
![]() |
public:
static double currentVersion()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Identifying Special CharactersUse the functions in this group to identify special characters. For example, you can test a character to determine if it is a graphic character, a control character, a punctuation mark, and so on.
![]() |
public:
static int digitValue(UniChar uc)
Finds the numeric value of the Unicode character which represents a decimal digit. For example, '0', '1', '2', ...; returns 0, 1, 2. This function is used so that non-Roman decimal digits can also be used in numbers, such as the Arabic numerals at U+0660 to U+0669.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isASCII(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isClosePunctuation(UniChar uc)
Checks if a character is an closing punctuation. For example, left parenthesis, or closing single quotation mark.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isControl(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isDigit(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isGraphic(UniChar uc)
Checks if a character is graphical. For example, special control characters.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isHexDigit(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isInSet(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isOpenPunctuation(UniChar uc)
Checks if a character is an opening punctuation. For example, right parenthesis, or opening single quotation mark.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isPrint(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isPunctuation(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isSymbol(UniChar uc)
Checks if a character is a symbol. For example, the math symbols, the division sign.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isXDigit(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static UniChar matchPunctuation(UniChar searchChar)
Finds the matching punctuation for the Unicode character. For example, left parenthesis for the right parenthesis, or vice versa.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Obtaining Information About the CharacterUse the functions in this group to test whether a character is a line separator, a paragraph separator, a space, an invisible character, or a trailing invisible character.
![]() |
public:
static bool isASpace(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isInvisible(UniChar uc)
Checks if a character is an invisible character. For example, white space or line terminator.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isLineOrParagraphSeparator(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isLineSeparator(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isParagraphSeparator(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isTrailingInvisible(UniChar uc)
Checks if a character is a trailing invisible character. For example, non-break white spaces.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Testing the Properties of Base Form Characters and DiacriticsUse the functions in this group to determine if a character is a base form letter, a number, or a diacritic, and, if it is a letter, whether it is uppercase, lowercase, or uncased.
![]() |
public:
static bool isAlpha(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isAlphaNumeric(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isBaseForm(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isDiacritic(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isLower(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isUncased(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static bool isUpper(UniChar uc)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
enum ECharacterProperty { kNonCharacter=0,
kFirstLetter=1,
kUppercaseLetter=1,
kLowercaseLetter=2,
kTitlecaseLetter=3,
kModifierLetter=4,
kOtherLetter=5,
kLastLetter=5,
kFirstMark=6,
kNonSpacingMark=6,
kEnclosingMark=7,
kCombiningSpacingMark=8,
kLastMark=8,
kFirstNumber=9,
kDecimalNumber=9,
kLetterNumber=10,
kOtherNumber=11,
kLastNumber=11,
kFirstSeparator=12,
kSpaceSeparator=12,
kLineSeparator=13,
kParagraphSeparator=14,
kLastSeparator=14,
kControlCharacter=15,
kFormatCharacter=16,
kPrivateUseCharacter=17,
kSurrogate=18,
kFirstPunctuation=19,
kDashPunctuation=19,
kOpenPunctuation=20,
kClosePunctuation=21,
kConnectorPunctuation=22,
kOtherPunctuation=23,
kLastPunctuation=23,
kFirstSymbol=24,
kMathSymbol=24,
kCurrencySymbol=25,
kModifierSymbol=26,
kOtherSymbol=27,
kLastSymbol=27,
kCharacterPropertiesCount=28,
kUpperCase=kUppercaseLetter,
kCompositeUpperCase=kUppercaseLetter,
kLowerCase=kLowercaseLetter,
kCompositeLowerCase=kLowercaseLetter,
kUncased=kOtherLetter,
kCompositeUncased=kOtherLetter,
kModifier=kModifierLetter,
kPresentationModifier=kFormatCharacter,
kDiacritic=kNonSpacingMark,
kFirstDigit=kFirstNumber,
kDecimalDigit=kDecimalNumber,
kNonDecimalDigit=kOtherNumber,
kLastDigit=kLastNumber,
kGeneralTechnicalSymbol=kOtherSymbol,
kFirstWhite=kFirstSeparator,
kWhiteSpace=kSpaceSeparator,
kLineTerminator=kLineSeparator,
kParagraphTerminator=kParagraphSeparator,
kPadSpace=kSpaceSeparator,
kLastWhite=kLastSeparator,
kControl=kControlCharacter,
kUnknownType=kNonCharacter,
kLastType=kLastSymbol }| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
enum EDirectionProperty { kLeftToRight=0,
kRightToLeft=1,
kEuropeanNumber=2,
kEuropeanNumberSeparator=3,
kEuropeanNumberTerminator=4,
kArabicNumber=5,
kCommonNumberSeparator=6,
kBlockSeparator=7,
kSegmentSeparator=8,
kWhiteSpaceNeutral=9,
kOtherNeutral=10 }| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
enum EUnicodeBounds { kLowBoundUnicode=0x0000,
kLowBoundASCII=kLowBoundUnicode,
kLowBoundLatinOne=kLowBoundASCII,
kHighBoundASCII=0x007F,
kHighBoundLatinOne=0x00FF,
kLowBoundHan=0x4E00,
kHighBoundHan=0x9FA5,
kLowBoundHangulSyllable=0xAC00,
kHighBoundHangulSyllable=0xD7A3,
kLowBoundUserZone=0xE000,
kHighBoundUserZone=0xF8FF,
kLowBoundDefinedUserZone=kHighBoundUserZone,
kLowBoundCompatibilityZone1=kHighBoundUserZone,
kHighBoundCompatibilityZone1=0xFEFE,
kLowBoundCompatibilityZone2=0xFF00,
kHighBoundCompatibilityZone2=0xFFEF,
kHighBoundUnicode=0xFFFF }kLowBoundHan = 0x4E00, // lower limit of currently defined Han range kHighBoundHan = 0x9FA5, // upper limit of currently defined Han range
kLowBoundHangulSyllable = 0xAC00, // lower limit of currently defined precomposed Hangul syllable range
EUnicodeBounds enumerates the following constants:
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
enum EUnicodeScript { kBasicLatin,
kLatin1Supplement,
kLatinExtendedA,
kLatinExtendedB,
kIPAExtension,
kSpacingModifier,
kCombiningDiacritical,
kGreek,
kCyrillic,
kArmenian,
kHebrew,
kArabic,
kDevanagari,
kBengali,
kGurmukhi,
kGujarati,
kOriya,
kTamil,
kTelugu,
kKannada,
kMalayalam,
kThai,
kLao,
kTibetan,
kGeorgian,
kHangulJamo,
kLatinExtendedAdditional,
kGreekExtended,
kGeneralPunctuation,
kSuperSubScript,
kCurrencySymbolScript,
kSymbolCombiningMark,
kLetterlikeSymbol,
kNumberForm,
kArrow,
kMathOperator,
kMiscTechnical,
kControlPicture,
kOpticalCharacter,
kEnclosedAlphanumeric,
kBoxDrawing,
kBlockElement,
kGeometricShape,
kMiscSymbol,
kDingbat,
kCJKSymbolPunctuation,
kHiragana,
kKatakana,
kBopomofo,
kHangulCompatibilityJamo,
kKanbun,
kEnclosedCJKLetterMonth,
kCJKCompatibility,
kCJKUnifiedIdeograph,
kHangulSyllable,
kHighSurrogate,
kHighPrivateUseSurrogate,
kLowSurrogate,
kPrivateUse,
kCJKCompatibilityIdeograph,
kAlphabeticPresentation,
kArabicPresentationA,
kCombiningHalfMark,
kCJKCompatibilityForm,
kSmallFormVariant,
kArabicPresentationB,
kNoScript,
kHalfwidthFullwidthForm,
kScriptCount }| Windows | OS/2 | AIX |
| Yes | Yes | Yes |