当前位置:主页>翻译技术>本地化技术一览
本地化技术一览
来源:作者:本站


UTF-16 在0000到FFFF范围(即基本多语言平面)内以单个16位无符号整数编码指针。辅助平面内的编码指针由两个16位无符号整数代表。这些编码单位被称为代用对。代用对的值在D800到DFFF间,没有分配给任何字符。这样,UTF-16 程序容易分辨单个编码单位和代用对。Unicode 标准(8)给出了代用对的详情。

UTF-16 is a good choice for keeping general Unicode strings, as it is optimized for characters in BMP, which is used in 99 percent of Unicode texts. It consumes about half of the storage required by UTF-32.

UTF-16 是保存一般 Unicode 字符串的好方法,因为它对在99%的 Unicode 文本中使用的基本多语言平面内的字符进行了优化。它只需要相当于 UTF-32 所需一半的存储空间。

UTF-8
===UTF-8===

To meet the requirements of legacy byte-oriented ASCII-based systems, UTF-8 is defined as variable- width encoding form that preserves ASCII compatibility. It uses one to four 8-bit code units to represent a Unicode character, depending on the code point value. The code points between 0000 and 007F are encoded in a single byte, making any ASCII string a valid UTF-8. Beyond the ASCII range of Unicode, some non-ideographic characters between 0080 and 07FF are encoded with two bytes. Then, Indic scripts and CJK ideographs between 0800 and FFFF are encoded with three bytes. Supplementary characters beyond BMP require four bytes. The Unicode Standard(9) provides more detail of UTF-8.

为满足旧式的基于 ASCII 的,面向字节处理的系统的要求,UTF-8 被定义为一种保留了 ASCII 兼容性的可变宽度编码形式。根据编码指针数值的不同,它使用一个到四个8位的编码单位来表示一个 Unicode 字符。在0000到007F范围内的编码指针用一个字节编码,这样任何 ASCII 字符串在 UTF-8 下都同样有效。在 Unicode 的 ASCII 范围外,一些在0080到07FF之间的非表意字符用两个字节编码。在其后的位于0800和FFFF范围内的印地语和 CJK 表意文字用三个字节编码。基本多语言平面之外的辅助字符需要四个字节。Unicode 标准(9)提供了 UTF-8 的详细介绍。

UTF-8 is typically the preferred encoding form for the Internet. The ASCII compatibility helps a lot in migration from old systems. UTF-8 also has the advantage of being byte-serialized and friendly to C or other programming languages APIs. For example, the traditional string collation using byte-wise comparison works with UTF-8.

UTF-8 是因特网上典型的理想编码形式。ASCII 兼容性对从旧系统迁移帮助很大。UTF-8 还有字节串行化和对 C 或其他语言编程接口友好的优点。例如,传统的逐字节比较方式的字符排序表在 UTF-8 下也能工作。

In short, UTF-8 is the most widely adopted encoding form of Unicode.

一句话,UTF-8 是 Unicode 最普及的编码形式。
上一页 1 2 34 5 6 7 8 9 10 11 12 13 14 15 16 17 下一页
上一篇:搜索技巧
下一篇:本地化关键概念