ITranscoder

The main class that provide character encodings conversion to and from Unicode.

ITranscoder provides character encodings conversion to and from Unicode. This is a platform-independent abstract class which defines the high-level bi-directional conversion protocols. It also contains static Transcoder object factory functions, exception character handling functions, query functions, and two protected pure virtual functions that mandate the implementation to be provided in the derived classes such as IWin32Transcoder, IISO8859_1Transcoder, UTF8Transcoder, etc. The ITranscoder::createTranscoder transcoder creation function takes the name of a supported character set listed below:

     Win32 platform supports the following names:
         GB-2312
         ISO-8859-1
         ISO-8859-2
         ISO-8859-7
         ISO-8859-8
         ISO-8859-9
         KSC-5601
         MSCP-10000
         MSCP-1250
         MSCP-1251
         MSCP-1252
         MSCP-1253
         MSCP-1254
         MSCP-1255
         MSCP-1256
         MSCP-437
         MSCP-850
         MSCP-936
         Shift-JIS
         US-ASCII
         UTF-8

OS2 platform supports the following names:

CNS-11643.1986 EUC GB-2312 IBM-437 IBM-850 IBM-950 ISO-8859-1 ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9 KSC-5601 Shift-JIS US-ASCII UTF-8

The Transcoder Framework provides two kinds of interfaces: a high-level, easy-to- use API and a pointer-based API. The high-level API converts between a simple text string and a Unicode string by taking only two string parameters IString and IText. The pointer-based API provides a low-level conversion functionality, such that it is flexible enough to recover from conversion failure when error occurs during transcoding.

A subclass of ITranscoder must be implemented by overriding two protected pure virtual functions, doToUnicode and doFromUnicode, and other subclass-specific query functions.


ITranscoder - Member Functions and Data by Group

Constructors & Destructor

Use the constructors and destructor in this group to create and destroy objects of class ITranscoder.


[view class]
~ITranscoder
public:
virtual ~ITranscoder()
Destructor for deleting an ITranscoder object.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
ITranscoder

Default constructor for ITranscoder.


Overload 1
Copy constructor for ITranscoder.
protected:
ITranscoder(const ITranscoder& source)

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 2
Default constructor for ITranscoder.
protected:
ITranscoder()

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Assignment Operator

Use this operator to replace the current ITranscoder object with the given one.


[view class]
operator =
protected:
ITranscoder& operator =(const ITranscoder& right)
Assignment operator for ITranscoder.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Converting To and From Unicode Using Pointer-Based Functions

Use the low-level conversion functions in this group to convert a Unicode text string to another encoding or a text string in another encoding to Unicode.


[view class]
fromUnicode
Converts from a Unicode string to a foreign code set string.

Converts from a Unicode string to a foreign code set string. This high-level (non-pointer-based) conversion function implicitly reset the internal Transcoding state whenever it is being called.


Overload 1
public:
virtual result fromUnicode( const UniChar* from, const UniChar* from_end, const UniChar *& from_next, char* to, char* to_limit, char *& to_next )

from
Beginning of the given Unicode text.
from_end
End of the given Unicode text.
from_next
Pointer pointing one beyond the last UniChar character successfully converted.
to
Beginning of the output conversion buffer.
to_limit
End of the output conversion buffer.
to_next
Pointer pointing one beyond the last foreign code set character successfully converted.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 2
public:
virtual result fromUnicode( const UniChar* from, const UniChar* from_end, const UniChar *& from_next, IString& to )

from
Beginning of the given Unicode text.
from_end
End of the given Unicode text.
from_next
Pointer pointing one beyond the last UniChar character successfully converted.
to
The output text in foreign code set.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 3
public:
virtual result fromUnicode(const IText& from, IString& to)

from
The given Unicode text.
to
The output text in foreign code set.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 4
public:
virtual result fromUnicode( const IText& from, char* to, char* to_limit, char *& to_next )

from
The given Unicode text.
to
Beginning of the output conversion buffer.
to_limit
End of the output conversion buffer.
to_next
Pointer pointing one beyond the last foreign code set character successfully converted.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
toUnicode
Converts from a foreign code set string to a Unicode string.

Converts from a foreign code set string to a Unicode string. Characters are translated in the range [from, from_end), placing the results in sequential position starting at "to." It converts no more than (from_end - from) characters, and stores no more than (to_limit - to) characters. If it encounters a character it cannot convert, the "exception character" is handled according to the "Unmapped Behavior" for this Transcoder. For instance, the conversion stops if it encounters a character it cannot convert, and the unmapped behavior was set to kStop. It always leaves the from_next and to_next pointers pointing one beyond the last character successfully converted.


Overload 1
public:
virtual result toUnicode( const char* from, const char* from_end, const char *& from_next, IText& to )

from
Beginning of the given foreign code set text.
from_end
End of the given foreign code set text.
from_next
Pointer pointing one beyond the last foreign code set character successfully converted.
to
The output Unicode text.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 2
public:
virtual result toUnicode(const IString& from, IText& to)

from
The given non-Unicode text.
to
The output Unicode text.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 3
public:
virtual result toUnicode( const char* from, const char* from_end, const char *& from_next, UniChar* to, UniChar* to_limit, UniChar *& to_next )

from
Beginning of the given foreign code set text.
from_end
End of the given foreign code set text.
from_next
Pointer pointing one beyond the last foreign code set character successfully converted.
to
Beginning of the output conversion buffer.
to_limit
End of the output conversion buffer.
to_next
Pointer pointing one beyond the last UniChar character successfully converted.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 4
public:
virtual result toUnicode( const IString& from, UniChar* to, UniChar* to_limit, UniChar *& to_next )

from
The given non-Unicode text.
to
Beginning of the output conversion buffer.
to_limit
End of the output conversion buffer.
to_next
Pointer that points one beyond the last UniChar character successfully converted.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Creating Transcoder Objects

Use the functions in this group to create Transcoder objects from the given values, for example, from the given character set name or from the current host character set and the kSupersetMapping proximity.


[view class]
createTranscoder
Create a transcoder based on current host character set and the kSupersetMapping proximity.

Creates a Transcoder object based on the given name of the foreign character set (non-Unicode), and mapping proximity. Clients can also create a default Transcoder object using the current host character set by calling the createTranscoder API with no parameter.


Overload 1
public:
static ITranscoder* createTranscoder()

Return
a Transcoder object.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 2
Creates a Transcoder object based on the given character set name.
public:
static ITranscoder* createTranscoder( const IText& charSet, EMappingProximity proximity = kSupersetMapping )

charSet
The name of the given foreign character set.
proximity
The mapping proximity.

Return
a Transcoder object.

Exception

kNoAdequateTranscoder

if the given foreign character set name and the mapping proximity specified is not supported. kTranscoderNotInstalled if the code page or conversion table for the given foreign character set is supported but not installed in the current host.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Getting and Setting the Current Substitute Character

Use these functions to get the substitute character to be used for a character that cannot be directly mapped or to set the substitute character to be used.


[view class]
charSubstitute
Gets the current char-based substitute character.
public:
virtual char charSubstitute()
Gets the current character-based substitute character.

Return
the current char-based substitute character.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
setCharSubstitute
Sets the char-based substitute character with the given character.
public:
virtual void setCharSubstitute(char substitute)
Sets the character-based substitute character with the given character.

substitute
The given char-based substitute character.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
uniCharSubstitute
public:
virtual UniChar uniCharSubstitute()
Gets the current Unicode substitute character for this transcoder.

Return
the current Unicode substitute character.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Getting and Setting the Unmapped Character Handling Behavior

Use the functions in this group to get and set the character handling behavior for unmapped characters.


[view class]
setUnmappedBehavior
public:
virtual void setUnmappedBehavior( EUnmappedBehavior unmappedBehavior )
Sets the current unmapped character handling behavior for this transcoder.

unmappedBehavior
The given unmapped character handling behavior. Default unmapped behavior uses kUseSub.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
unmappedBehavior
public:
virtual EUnmappedBehavior unmappedBehavior()
Gets the current unmapped character handling behavior for this transcoder.

Return
the current unmapped character handling behavior. Default unmapped behavior uses kUseSub before setting it using setUnmappedBehavior() function.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Managing the Transcoder State

Use the functions in this group to explicitly reset the transcoding state to be in host character set and to flush the output conversion buffer.


[view class]
flush
public:
virtual result flush( const char* to, const char* to_limit, char *& to_next )
Flushes the output conversion buffer.

Flushes the output conversion buffer so that the state of the transcoder can be made in sync with the state of "KanjiOut," i.e., the ASCII state. This is only needed when clients use low-level pointer-based APIs, and want to make sure that the output conversion buffer is in ASCII state. It returns codecvt_base::partial if the buffer passed in with "to" and "to_limit" is invalid. For instance, when the current state is not in the host character set state, and (to_limit-to) < 3. The length for "KanjiIn" escape sequence is 3.

to
beginning of the output conversion buffer.
to_limit
end of the output conversion buffer.
to_next
pointer pointing one beyond the last character successfully processed in the output conversion buffer.

Return
the status of the flush.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
resetState
public:
virtual void resetState()
Explicitly resets the transcoding state to be in host character set.

Explicitly resets the transcoding state to be in host character set. A deriving class containing a conversion state must clear its internal transcoding state when converting a new string text. This function does nothing in the ITranscoder base class.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Query Functions

Use the functions in this group to query the transcoder for information such as the character encoding it handles, the character set for the locale, the maximum number of bytes a Unicode character or character belonging to the other character set could generate, and the actual amount of storage in bytes required for the Unicode or other character.


[view class]
byteBufferSize
Gets the actual amount of storage required in bytes.

Gets the actual amount of storage required in bytes. This information can be used to prepare the storage required for the output foreign text string that is to be converted from a given Unicode string.


Overload 1
public:
virtual length_type byteBufferSize( const UniChar* from, const UniChar* from_end ) const = 0

from
beginning of the given Unicode string.
from_end
end of the given Unicode string.

Return
the actual amount of storage required to hold the text coverted from the given Unicode string.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 2
public:
virtual length_type byteBufferSize( const IText& uniText ) const = 0

uniText
the given Unicode string.

Return
the actual amount of storage required to hold the text coverted from the given Unicode string.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
characterEncoding
public:
virtual IText characterEncoding() const
Gets the character encoding which this transcoder handles.

Gets the character encoding which this transcoder handles. For example, if the character encoding that this transcoder handles is UTF-8, then the transcoder is used to convert UTF-8 to and from Unicode.

Return
the character encoding which this transcoder handles.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
characterSet
public:
static const IText characterSet(const ILocaleKey& key)
Gets the character set for the given locale.

Gets the character set for the given locale. This character set can be used in ITranscoder::createTranscoder to create a transcoder object.

key
the key for the given locale.

Return
the character set for the given locale.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
maximumBytesPerUniChar
public:
virtual length_type maximumBytesPerUniChar() const = 0
Gets the maximum number of bytes generated by a UniChar character.

Gets the maximum number of bytes generated by a UniChar character. This information can be used to prepare the storage required for strings converted from Unicode.

Return
the maximum number of bytes that can be generated by a single UniChar character.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes

Subclassing
This information depends on the code set conversion support by different platforms. For example, on Win32 this is determined by querying CPINFO. For UTF8, this is set to 4.

[view class]
maximumUniCharsPerByte
public:
virtual length_type maximumUniCharsPerByte() const = 0
Gets the maximum number of UniChars generated by a char-based character.

Gets the maximum number of UniChars generated by a char-based character. This information can be used to prepare the storage required for strings converted from char-based foreign code set.

Return
the maximum number of UniChars that can be generated by a single char-based character.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
uniCharBufferSize
Gets the actual amount of storage required in UniChars.

Gets the actual amount of storage required in UniChars. This information can be used to prepare the storage required for the output Unicode text string that is to be converted from a given foreign code set string.


Overload 1
public:
virtual length_type uniCharBufferSize( const char* from, const char* from_end ) const = 0

from
beginning of the given foreign code set string.
from_end
end of the given foreign code set string.

Return
the actual amount of storage required to hold the Unicode text to be coverted from the given foreign code set string.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Overload 2
public:
virtual length_type uniCharBufferSize( const IString& text ) const = 0

text
the given foreign code set string.

Return
the actual amount of storage required to hold the Unicode text to be coverted from the given foreign code set string.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


Setting the Character Encoding for the Transcoder

Use the function in this group to allow subclass providers to set the aracter encoding for the Transcoder.


[view class]
setCharacterEncoding
protected:
virtual void setCharacterEncoding(const IText& encoding)
Allows subclass providers to set the character encoding for this Transcoder.

encoding
the given character encoding.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes

Subclassing
If a subclass overwrites this method, be sure to call ITranscoder::setCharacterEncoding() in the overwritten method.

Transcoder Subclass Conversion

Use the functions in this group to convert text from another character encoding to Unicode or from Unicode to another character encoding.


[view class]
doFromUnicode
protected:
virtual result doFromUnicode( const UniChar* from, const UniChar* from_end, const UniChar *& from_next, char* to, char* to_limit, char *& to_next ) = 0
Converts from a Unicode string to a foreign code set string.

from
beginning of the given Unicode text.
from_end
end of the given Unicode text.
from_next
pointer pointing one beyond the last UniChar character successfully converted.
to
beginning of the output conversion buffer.
to_limit
end of the output conversion buffer.
to_next
pointer pointing one beyond the last foreign code set character successfully converted.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes

Subclassing
This is the only function that implements the real conversion engine from Unicode strings to char-based strings. This is to be called by both high-level and pointer-based APIs.

[view class]
doToUnicode
protected:
virtual result doToUnicode( const char* from, const char* from_end, const char *& from_next, UniChar* to, UniChar* to_limit, UniChar *& to_next ) = 0
Converts from a foreign code set string to a Unicode string.

Converts from a foreign code set string to a Unicode string. This protected pure virtual function must be implemented by the drived classes which converts text string from char* to UniChar*. Characters are translated in the range [from, from_end), placing the results in sequential position starting at "to." It converts no more than (from_end - from) characters, and stores no more than (to_limit - to) characters. If it encounters a character it cannot convert, the "exception character" is handled according to the "Unmapped Behavior" for this Transcoder. For instance, the conversion stops if it encounters a character it cannot convert, and the unmapped behavior was set to kStop. It always leaves the from_next and to_next pointers pointing one beyond the last character successfully converted.

from
beginning of the given foreign code set text.
from_end
end of the given foreign code set text.
from_next
pointer pointing one beyond the last foreign code set character successfully converted.
to
beginning of the output conversion buffer.
to_limit
end of the output conversion buffer.
to_next
pointer pointing one beyond the last UniChar character successfully converted.

Return
the status of the conversion.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes

Subclassing
This is the only function that implements the real conversion engine from char-based strings to Unicode strings. This is to be called by both high-level and pointer-based APIs.


ITranscoder - Enumerations


[view class]
EMappingProximity
enum EMappingProximity { kExactMapping=0, 
                         kSupersetMapping, 
                         kCloseMapping }
The mapping proximity between character sets.

Useful constants specifying how close a transcoder being created is to the character set specified.

kExactMapping - Create Transcoder that exactly matches with CharSet specified.
kSupersetMapping - Create Transcoder that is a superset of CharSet specified.
kCloseMapping - Create Transcoder that is close to CharSet specified.
 

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
EUnmappedBehavior
enum EUnmappedBehavior { kUseSub, 
                         kStop, 
                         kOmit }
Useful constants specifying how to handle unmapped characters.

These constants specify how to handle characters that cannot be converted or mapped, namely "exception characters." Exception characters are those characters whose mappings into or out of Unicode are not one-to-one.

kUseSub - Use substitution character if encounters exception characters.
kStop - Stop conversion if encounters exception characters.
kOmit - Omit exception characters during conversion.
 

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


ITranscoder - Type Definitions


[view class]
result
typedef codecvt_base::result result
Type used for indicating different character code conversion results.

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


[view class]
length_type
typedef size_t length_type

Type used for specifying character counts.

Notes:

Supported Platforms

Windows OS/2 AIX
Yes Yes Yes


ITranscoder - Inherited Member Functions and Data

Inherited Public Functions

Inherited Public Data

Inherited Protected Functions

Inherited Protected Data