ITranscoder provides character encodings conversion to and from Unicode. This is a platform-independent abstract class which defines the high-level bi-directional conversion protocols. It also contains static Transcoder object factory functions, exception character handling functions, query functions, and two protected pure virtual functions that mandate the implementation to be provided in the derived classes such as IWin32Transcoder, IISO8859_1Transcoder, UTF8Transcoder, etc. The ITranscoder::createTranscoder transcoder creation function takes the name of a supported character set listed below:
Win32 platform supports the following names:
GB-2312
ISO-8859-1
ISO-8859-2
ISO-8859-7
ISO-8859-8
ISO-8859-9
KSC-5601
MSCP-10000
MSCP-1250
MSCP-1251
MSCP-1252
MSCP-1253
MSCP-1254
MSCP-1255
MSCP-1256
MSCP-437
MSCP-850
MSCP-936
Shift-JIS
US-ASCII
UTF-8
OS2 platform supports the following names:
CNS-11643.1986
EUC
GB-2312
IBM-437
IBM-850
IBM-950
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
KSC-5601
Shift-JIS
US-ASCII
UTF-8
The Transcoder Framework provides two kinds of interfaces: a high-level, easy-to- use API and a pointer-based API. The high-level API converts between a simple text string and a Unicode string by taking only two string parameters IString and IText. The pointer-based API provides a low-level conversion functionality, such that it is flexible enough to recover from conversion failure when error occurs during transcoding.
A subclass of ITranscoder must be implemented by overriding two protected pure virtual functions, doToUnicode and doFromUnicode, and other subclass-specific query functions.
Constructors & DestructorUse the constructors and destructor in this group to create and destroy objects of class ITranscoder.
![]() |
public:
virtual ~ITranscoder()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
Default constructor for ITranscoder.
protected:
ITranscoder(const ITranscoder& source)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
protected:
ITranscoder()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Assignment OperatorUse this operator to replace the current ITranscoder object with the given one.
![]() |
protected:
ITranscoder& operator =(const ITranscoder& right)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Converting To and From Unicode Using Pointer-Based FunctionsUse the low-level conversion functions in this group to convert a Unicode text string to another encoding or a text string in another encoding to Unicode.
![]() |
Converts from a Unicode string to a foreign code set string.
public:
virtual result fromUnicode( const UniChar* from, const UniChar* from_end, const UniChar *& from_next, char* to, char* to_limit, char *& to_next )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual result fromUnicode( const UniChar* from, const UniChar* from_end, const UniChar *& from_next, IString& to )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual result fromUnicode(const IText& from, IString& to)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual result fromUnicode( const IText& from, char* to, char* to_limit, char *& to_next )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
Converts from a foreign code set string to a Unicode string. Characters are translated in the range [from, from_end), placing the results in sequential position starting at "to." It converts no more than (from_end - from) characters, and stores no more than (to_limit - to) characters. If it encounters a character it cannot convert, the "exception character" is handled according to the "Unmapped Behavior" for this Transcoder. For instance, the conversion stops if it encounters a character it cannot convert, and the unmapped behavior was set to kStop. It always leaves the from_next and to_next pointers pointing one beyond the last character successfully converted.
public:
virtual result toUnicode( const char* from, const char* from_end, const char *& from_next, IText& to )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual result toUnicode(const IString& from, IText& to)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual result toUnicode( const char* from, const char* from_end, const char *& from_next, UniChar* to, UniChar* to_limit, UniChar *& to_next )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual result toUnicode( const IString& from, UniChar* to, UniChar* to_limit, UniChar *& to_next )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Creating Transcoder ObjectsUse the functions in this group to create Transcoder objects from the given values, for example, from the given character set name or from the current host character set and the kSupersetMapping proximity.
![]() |
Creates a Transcoder object based on the given name of the foreign character set (non-Unicode), and mapping proximity. Clients can also create a default Transcoder object using the current host character set by calling the createTranscoder API with no parameter.
public:
static ITranscoder* createTranscoder()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
static ITranscoder* createTranscoder( const IText& charSet, EMappingProximity proximity = kSupersetMapping )
kNoAdequateTranscoder |
if the given foreign character set name and the mapping proximity specified is not supported. kTranscoderNotInstalled if the code page or conversion table for the given foreign character set is supported but not installed in the current host. |
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Getting and Setting the Current Substitute CharacterUse these functions to get the substitute character to be used for a character that cannot be directly mapped or to set the substitute character to be used.
![]() |
public:
virtual char charSubstitute()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual void setCharSubstitute(char substitute)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual UniChar uniCharSubstitute()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Getting and Setting the Unmapped Character Handling BehaviorUse the functions in this group to get and set the character handling behavior for unmapped characters.
![]() |
public:
virtual void setUnmappedBehavior( EUnmappedBehavior unmappedBehavior )
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual EUnmappedBehavior unmappedBehavior()
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Managing the Transcoder StateUse the functions in this group to explicitly reset the transcoding state to be in host character set and to flush the output conversion buffer.
![]() |
public:
virtual result flush( const char* to, const char* to_limit, char *& to_next )
Flushes the output conversion buffer so that the state of the
transcoder can be made in sync with the state of "KanjiOut,"
i.e., the ASCII state.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual void resetState()
Explicitly resets the transcoding state to be in host character set.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Query FunctionsUse the functions in this group to query the transcoder for information such as the character encoding it handles, the character set for the locale, the maximum number of bytes a Unicode character or character belonging to the other character set could generate, and the actual amount of storage in bytes required for the Unicode or other character.
![]() |
Gets the actual amount of storage required in bytes. This information can be used to prepare the storage required for the output foreign text string that is to be converted from a given Unicode string.
public:
virtual length_type byteBufferSize( const UniChar* from, const UniChar* from_end ) const = 0
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual length_type byteBufferSize( const IText& uniText ) const = 0
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual IText characterEncoding() const
Gets the character encoding which this transcoder handles. For example, if the character encoding that this transcoder handles is UTF-8, then the transcoder is used to convert UTF-8 to and from Unicode.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
static const IText characterSet(const ILocaleKey& key)
Gets the character set for the given locale. This character set can be used in ITranscoder::createTranscoder to create a transcoder object.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual length_type maximumBytesPerUniChar() const = 0
Gets the maximum number of bytes generated by a UniChar character. This information can be used to prepare the storage required for strings converted from Unicode.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
public:
virtual length_type maximumUniCharsPerByte() const = 0
Gets the maximum number of UniChars generated by a char-based character. This information can be used to prepare the storage required for strings converted from char-based foreign code set.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
Gets the actual amount of storage required in UniChars. This information can be used to prepare the storage required for the output Unicode text string that is to be converted from a given foreign code set string.
public:
virtual length_type uniCharBufferSize( const char* from, const char* from_end ) const = 0
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
public:
virtual length_type uniCharBufferSize( const IString& text ) const = 0
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Setting the Character Encoding for the TranscoderUse the function in this group to allow subclass providers to set the aracter encoding for the Transcoder.
![]() |
protected:
virtual void setCharacterEncoding(const IText& encoding)
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
Transcoder Subclass ConversionUse the functions in this group to convert text from another character encoding to Unicode or from Unicode to another character encoding.
![]() |
protected:
virtual result doFromUnicode( const UniChar* from, const UniChar* from_end, const UniChar *& from_next, char* to, char* to_limit, char *& to_next ) = 0
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
protected:
virtual result doToUnicode( const char* from, const char* from_end, const char *& from_next, UniChar* to, UniChar* to_limit, UniChar *& to_next ) = 0
Converts from a foreign code set string to a Unicode string. This protected pure virtual function must be implemented by the drived classes which converts text string from char* to UniChar*. Characters are translated in the range [from, from_end), placing the results in sequential position starting at "to." It converts no more than (from_end - from) characters, and stores no more than (to_limit - to) characters. If it encounters a character it cannot convert, the "exception character" is handled according to the "Unmapped Behavior" for this Transcoder. For instance, the conversion stops if it encounters a character it cannot convert, and the unmapped behavior was set to kStop. It always leaves the from_next and to_next pointers pointing one beyond the last character successfully converted.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
enum EMappingProximity { kExactMapping=0,
kSupersetMapping,
kCloseMapping }Useful constants specifying how close a transcoder being created is to the character set specified.
kExactMapping - Create Transcoder that exactly matches with CharSet specified. kSupersetMapping - Create Transcoder that is a superset of CharSet specified. kCloseMapping - Create Transcoder that is close to CharSet specified.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
enum EUnmappedBehavior { kUseSub,
kStop,
kOmit }These constants specify how to handle characters that cannot be converted or mapped, namely "exception characters." Exception characters are those characters whose mappings into or out of Unicode are not one-to-one.
kUseSub - Use substitution character if encounters exception characters. kStop - Stop conversion if encounters exception characters. kOmit - Omit exception characters during conversion.
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
typedef codecvt_base::result result
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |
![]() |
typedef size_t length_type
Type used for specifying character counts.
Notes:
| Windows | OS/2 | AIX |
| Yes | Yes | Yes |