IUnicode

The Unicode class provides the property information for a Unicode character.

The Unicode class provides the property information for a Unicode character. The Unicode character information, provided implicitly by the Unicode character encoding standard, includes information about the sript (for example, symbols or control characters) to which the character belongs, as well as semantic information such as whether a character is a digit or uppercased, lowercased, or uncased. The current implementation is based on the Unicode Standard 2.0.14.

Do not derive your own classes from this class.


IUnicode - Member Functions and Data by Group

Getting Character Property Information

Use the functions in this group to get the Unicode character's script, character property, and direction property information.


[view class]
characterDirection
public:
static EDirectionProperty characterDirection(UniChar uc)
Returns the linguistic direction property of a character.

For example, 0x0041 (letter A) has the kLeftToRight directional property.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
script
public:
static EUnicodeScript script(UniChar uc)
Returns the script property of a Unicode character.

For example, 0x03A9 (Omega) is of the Greek script.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
type
public:
static ECharacterProperty type(UniChar uc)
Returns the character type property of a Unicode character.

For example, 0x0030 (Zero) has kDecimalNumber (Numeric) type.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Getting the Unicode Version

Use the function in this group to identify the version of the Unicode standard used for this implementation.


[view class]
currentVersion
public:
static double currentVersion()
Returns the version of the Unicode standard this implementation is based on.

Return
the Unicode standard version number.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Identifying Special Characters

Use the functions in this group to identify special characters. For example, you can test a character to determine if it is a graphic character, a control character, a punctuation mark, and so on.


[view class]
digitValue
public:
static int digitValue(UniChar uc)
Finds the numeric value of the Unicode character which represents a decimal digit.

Finds the numeric value of the Unicode character which represents a decimal digit. For example, '0', '1', '2', ...; returns 0, 1, 2. This function is used so that non-Roman decimal digits can also be used in numbers, such as the Arabic numerals at U+0660 to U+0669.

uc
the Unicode character

Return
the numerical value of a Unicode character.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isASCII
public:
static bool isASCII(UniChar uc)
Checks if a character is an ASCII character.

uc
the Unicode character

Return
true if the Unicode character is an ASCII character, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isClosePunctuation
public:
static bool isClosePunctuation(UniChar uc)
Checks if a character is an closing punctuation.

Checks if a character is an closing punctuation. For example, left parenthesis, or closing single quotation mark.

uc
the Unicode character

Return
true if the Unicode character is a closing punctuation, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isControl
public:
static bool isControl(UniChar uc)
Checks if a character is a control character.

uc
the Unicode character

Return
true if the Unicode character is a control character, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isDigit
public:
static bool isDigit(UniChar uc)
Checks if a character is numeric.

uc
the Unicode character

Return
true if the Unicode character is numeric, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isGraphic
public:
static bool isGraphic(UniChar uc)
Checks if a character is graphical.

Checks if a character is graphical. For example, special control characters.

uc
the Unicode character

Return
true if the Unicode character is graphical, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isHexDigit
public:
static bool isHexDigit(UniChar uc)
Checks if a character is a hex digit.

uc
the Unicode character

Return
true if the Unicode character is a hex digit, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isInSet
public:
static bool isInSet(UniChar uc)
Checks if a character is a valid Unicode charater.

uc
the Unicode character

Return
true if the Unicode character is a valid Unicode character, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isOpenPunctuation
public:
static bool isOpenPunctuation(UniChar uc)
Checks if a character is an opening punctuation.

Checks if a character is an opening punctuation. For example, right parenthesis, or opening single quotation mark.

uc
the Unicode character

Return
true if the Unicode character is an opening punctuation, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isPrint
public:
static bool isPrint(UniChar uc)
Checks if a character is printable.

uc
the Unicode character

Return
true if the Unicode character is printable, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isPunctuation
public:
static bool isPunctuation(UniChar uc)
Checks if a character is a punctuation.

uc
the Unicode character

Return
true if the Unicode character is a punctuation, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isSymbol
public:
static bool isSymbol(UniChar uc)
Checks if a character is a symbol.

Checks if a character is a symbol. For example, the math symbols, the division sign.

uc
the Unicode character

Return
true if the Unicode character is a symbol, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isXDigit
public:
static bool isXDigit(UniChar uc)
Checks if a character is a hex digit.

uc
the Unicode character

Return
true if the Unicode character is a hex digit, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
matchPunctuation
public:
static UniChar matchPunctuation(UniChar searchChar)
Finds the matching punctuation for the Unicode character.

Finds the matching punctuation for the Unicode character. For example, left parenthesis for the right parenthesis, or vice versa.

uc
the Unicode character

Return
the matching punctuation.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Obtaining Information About the Character

Use the functions in this group to test whether a character is a line separator, a paragraph separator, a space, an invisible character, or a trailing invisible character.


[view class]
isASpace
public:
static bool isASpace(UniChar uc)
Checks if a character is a space character.

uc
the Unicode character

Return
true if the Unicode character is a space character, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isInvisible
public:
static bool isInvisible(UniChar uc)
Checks if a character is an invisible character.

Checks if a character is an invisible character. For example, white space or line terminator.

uc
the Unicode character

Return
true if the Unicode character is an invisible character, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isLineOrParagraphSeparator
public:
static bool isLineOrParagraphSeparator(UniChar uc)
Checks if a character is a line or paragraph separator.

uc
the Unicode character

Return
true if the Unicode character is a line or paragraph separator, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isLineSeparator
public:
static bool isLineSeparator(UniChar uc)
Checks if a character is a line separator.

uc
the Unicode character

Return
true if the Unicode character is a line separator, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isParagraphSeparator
public:
static bool isParagraphSeparator(UniChar uc)
Checks if a character is a paragraph separator.

uc
the Unicode character

Return
true if the Unicode character is a paragraph separator, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isTrailingInvisible
public:
static bool isTrailingInvisible(UniChar uc)
Checks if a character is a trailing invisible character.

Checks if a character is a trailing invisible character. For example, non-break white spaces.

uc
the Unicode character

Return
true if the Unicode character is a trailing invisible character, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Testing the Properties of Base Form Characters and Diacritics

Use the functions in this group to determine if a character is a base form letter, a number, or a diacritic, and, if it is a letter, whether it is uppercase, lowercase, or uncased.


[view class]
isAlpha
Checks if a character is an alphabet.
public:
static bool isAlpha(UniChar uc)
Checks if a character is alphabetic.

uc
the Unicode character

Return
true if the Unicode character is an alphabet, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isAlphaNumeric
Checks if a character is an alphabet or number.
public:
static bool isAlphaNumeric(UniChar uc)
Checks if a character is an alphabetic letter or a number.

uc
the Unicode character

Return
true if the Unicode character is an alphabet or number, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isBaseForm
public:
static bool isBaseForm(UniChar uc)
Checks if a character is a based form letter.

uc
the Unicode character

Return
true if the Unicode character is a based form letter, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isDiacritic
public:
static bool isDiacritic(UniChar uc)
Checks if a character is a diacritic mark.

uc
the Unicode character

Return
true if the Unicode character is a diacritic mark, false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isLower
Tests whether a character is a lowercase letter.
public:
static bool isLower(UniChar uc)
Checks if a character is lowercase.

uc
the Unicode character

Return
true if the Unicode character is lowercase , false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isUncased
Checks if a character is uncased.
public:
static bool isUncased(UniChar uc)
Checks if a character is an uncased letter.

uc
the Unicode character

Return
true if the Unicode character is uncased , false otherwise.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
isUpper
public:
static bool isUpper(UniChar uc)
Test whether the character is an uppercase letter.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


IUnicode - Enumerations


[view class]
ECharacterProperty
enum ECharacterProperty { kNonCharacter=0, 
                          kFirstLetter=1, 
                          kUppercaseLetter=1, 
                          kLowercaseLetter=2, 
                          kTitlecaseLetter=3, 
                          kModifierLetter=4, 
                          kOtherLetter=5, 
                          kLastLetter=5, 
                          kFirstMark=6, 
                          kNonSpacingMark=6, 
                          kEnclosingMark=7, 
                          kCombiningSpacingMark=8, 
                          kLastMark=8, 
                          kFirstNumber=9, 
                          kDecimalNumber=9, 
                          kLetterNumber=10, 
                          kOtherNumber=11, 
                          kLastNumber=11, 
                          kFirstSeparator=12, 
                          kSpaceSeparator=12, 
                          kLineSeparator=13, 
                          kParagraphSeparator=14, 
                          kLastSeparator=14, 
                          kControlCharacter=15, 
                          kFormatCharacter=16, 
                          kPrivateUseCharacter=17, 
                          kSurrogate=18, 
                          kFirstPunctuation=19, 
                          kDashPunctuation=19, 
                          kOpenPunctuation=20, 
                          kClosePunctuation=21, 
                          kConnectorPunctuation=22, 
                          kOtherPunctuation=23, 
                          kLastPunctuation=23, 
                          kFirstSymbol=24, 
                          kMathSymbol=24, 
                          kCurrencySymbol=25, 
                          kModifierSymbol=26, 
                          kOtherSymbol=27, 
                          kLastSymbol=27, 
                          kCharacterPropertiesCount=28, 
                          kUpperCase=kUppercaseLetter, 
                          kCompositeUpperCase=kUppercaseLetter, 
                          kLowerCase=kLowercaseLetter, 
                          kCompositeLowerCase=kLowercaseLetter, 
                          kUncased=kOtherLetter, 
                          kCompositeUncased=kOtherLetter, 
                          kModifier=kModifierLetter, 
                          kPresentationModifier=kFormatCharacter, 
                          kDiacritic=kNonSpacingMark, 
                          kFirstDigit=kFirstNumber, 
                          kDecimalDigit=kDecimalNumber, 
                          kNonDecimalDigit=kOtherNumber, 
                          kLastDigit=kLastNumber, 
                          kGeneralTechnicalSymbol=kOtherSymbol, 
                          kFirstWhite=kFirstSeparator, 
                          kWhiteSpace=kSpaceSeparator, 
                          kLineTerminator=kLineSeparator, 
                          kParagraphTerminator=kParagraphSeparator, 
                          kPadSpace=kSpaceSeparator, 
                          kLastWhite=kLastSeparator, 
                          kControl=kControlCharacter, 
                          kUnknownType=kNonCharacter, 
                          kLastType=kLastSymbol }
Enumerates character properties.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
EDirectionProperty
enum EDirectionProperty { kLeftToRight=0, 
                          kRightToLeft=1, 
                          kEuropeanNumber=2, 
                          kEuropeanNumberSeparator=3, 
                          kEuropeanNumberTerminator=4, 
                          kArabicNumber=5, 
                          kCommonNumberSeparator=6, 
                          kBlockSeparator=7, 
                          kSegmentSeparator=8, 
                          kWhiteSpaceNeutral=9, 
                          kOtherNeutral=10 }
Enumerates values that specify the language directional property of a character set.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
EUnicodeBounds
enum EUnicodeBounds { kLowBoundUnicode=0x0000, 
                      kLowBoundASCII=kLowBoundUnicode, 
                      kLowBoundLatinOne=kLowBoundASCII, 
                      kHighBoundASCII=0x007F, 
                      kHighBoundLatinOne=0x00FF, 
                      kLowBoundHan=0x4E00, 
                      kHighBoundHan=0x9FA5, 
                      kLowBoundHangulSyllable=0xAC00, 
                      kHighBoundHangulSyllable=0xD7A3, 
                      kLowBoundUserZone=0xE000, 
                      kHighBoundUserZone=0xF8FF, 
                      kLowBoundDefinedUserZone=kHighBoundUserZone, 
                      kLowBoundCompatibilityZone1=kHighBoundUserZone, 
                      kHighBoundCompatibilityZone1=0xFEFE, 
                      kLowBoundCompatibilityZone2=0xFF00, 
                      kHighBoundCompatibilityZone2=0xFFEF, 
                      kHighBoundUnicode=0xFFFF }
enum EUnicodeBounds // These constants may be replaced with UniChar's at some // point in the future { kLowBoundUnicode = 0x0000, kLowBoundASCII = kLowBoundUnicode, kLowBoundLatinOne = kLowBoundASCII, kHighBoundASCII = 0x007F, kHighBoundLatinOne = 0x00FF,

kLowBoundHan = 0x4E00, // lower limit of currently defined Han range kHighBoundHan = 0x9FA5, // upper limit of currently defined Han range

kLowBoundHangulSyllable = 0xAC00, // lower limit of currently defined precomposed Hangul syllable range

EUnicodeBounds enumerates the following constants:

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
EUnicodeScript
enum EUnicodeScript { kBasicLatin, 
                      kLatin1Supplement, 
                      kLatinExtendedA, 
                      kLatinExtendedB, 
                      kIPAExtension, 
                      kSpacingModifier, 
                      kCombiningDiacritical, 
                      kGreek, 
                      kCyrillic, 
                      kArmenian, 
                      kHebrew, 
                      kArabic, 
                      kDevanagari, 
                      kBengali, 
                      kGurmukhi, 
                      kGujarati, 
                      kOriya, 
                      kTamil, 
                      kTelugu, 
                      kKannada, 
                      kMalayalam, 
                      kThai, 
                      kLao, 
                      kTibetan, 
                      kGeorgian, 
                      kHangulJamo, 
                      kLatinExtendedAdditional, 
                      kGreekExtended, 
                      kGeneralPunctuation, 
                      kSuperSubScript, 
                      kCurrencySymbolScript, 
                      kSymbolCombiningMark, 
                      kLetterlikeSymbol, 
                      kNumberForm, 
                      kArrow, 
                      kMathOperator, 
                      kMiscTechnical, 
                      kControlPicture, 
                      kOpticalCharacter, 
                      kEnclosedAlphanumeric, 
                      kBoxDrawing, 
                      kBlockElement, 
                      kGeometricShape, 
                      kMiscSymbol, 
                      kDingbat, 
                      kCJKSymbolPunctuation, 
                      kHiragana, 
                      kKatakana, 
                      kBopomofo, 
                      kHangulCompatibilityJamo, 
                      kKanbun, 
                      kEnclosedCJKLetterMonth, 
                      kCJKCompatibility, 
                      kCJKUnifiedIdeograph, 
                      kHangulSyllable, 
                      kHighSurrogate, 
                      kHighPrivateUseSurrogate, 
                      kLowSurrogate, 
                      kPrivateUse, 
                      kCJKCompatibilityIdeograph, 
                      kAlphabeticPresentation, 
                      kArabicPresentationA, 
                      kCombiningHalfMark, 
                      kCJKCompatibilityForm, 
                      kSmallFormVariant, 
                      kArabicPresentationB, 
                      kNoScript, 
                      kHalfwidthFullwidthForm, 
                      kScriptCount }
All the scripts that is implemented in Unicode 2.0. Enumerates all the scripts that implemented in Unicode 2.0.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


IUnicode - Inherited Member Functions and Data

Inherited Public Functions

Inherited Public Data

Inherited Protected Functions

Inherited Protected Data