The simplest way to use UTF-8 strings in UTF-16 APIs is via the C++ icu::UnicodeString methods fromUTF8(const StringPiece &utf8) and toUTF8String(StringClass &result). (Input length=-1 means NUL-terminated, output is NUL-terminated if there is space, output overflow is handled with preflighting for details see the parent Strings page.) Some newer APIs take an icu::StringPiece argument and write to an icu::ByteSink or to a string class object like std::string. Some data structures are designed to work equally well with UTF-16 and UTF-8.įor UTF-8 strings, ICU normally uses (const) char * pointers and int32_t lengths, normally with semantics parallel to UTF-16 handling. While most of ICU works with UTF-16 strings and uses data structures optimized for UTF-16, there are APIs that facilitate working with UTF-8, or are optimized for UTF-8, or work with Unicode code points (21-bit integer values) regardless of string encoding. In Java, all strings are encoded in UTF-16, except for conversion from bytes to strings (via InputStreamReader or similar) and from strings to bytes (OutputStreamWriter etc.). Note: This page is only relevant for C/C++. This site uses Just the Docs, a documentation theme for Jekyll. Updating MeasureUnit with new CLDR data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |