Text Storage

IText is the basic mechanism for storing and manipulating Unicode text strings throughout the Open Class libraries and frameworks. IText encapsulates Unicode characters and any associated styling information, and fully supports mixed style runs. IText keeps the styles with the characters, so you can pass text strings between objects and applications without loss of styling information.

IText is the primary string format supported by the Open Class International Framework. It is suitable for strings of any length, from a few characters to document-length strings. IText was also designed so that you can use it for unstyled strings without incurring the overhead in object size or performance associated with the styling mechanism.

Many IText functions take a range that specifies the subset of characters to operate on. The range is defined by an offset and a character count, where the offset of the first character in the object is 0. To specify an insertion position, specify the offset of the character immediately following the position where the new text will be inserted.

This figure shows the IText interface and related classes:

Storage Mechanism

IText manages its own storage. The characters and styles are stored in separate objects that the IText mechanism creates, deletes, and shares transparently. IText itself is very small and has very fast copy performance. This allows you to:

The underlying storage object can be shared by multiple IText instances. It is reference counted and uses copy-on-write semantics. For example, when an IText object is copied, the actual storage is not duplicated--the reference count is simply increased. Note that even while they share storage, two IText objects behave as two distinct objects. They stop sharing storage when one is modified. This is guaranteed to be true even in multithreaded situations.

The framework manages this mechanism for you. However, you should be aware of it when using classes such as IFastTextIterator, which do not consider the underlying storage mechanism.

The storage mechanism handles both short and large strings efficiently. The storage allocation strategy changes dynamically as appropriate for the size of the string. For small strings, the characters are stored in a single, contiguous, heap-allocated array, resized only when necessary. Longer strings are broken up into non-contiguous storage blocks, or chunks, as illustrated in this figure:

The IText function storage_chunk provides access to these chunks of text. You specify a character offset, and the function returns a pointer to the chunk of storage containing that offset. If the string is stored in a single contiguous block, the function returns a pointer to that block.