Please enable JavaScript to view this site.

ESL Documentation

Navigation: ESL Documentation > ESL Internationalisation

Character Sets & Code Pages

Scroll Prev Top Next More

One significant and obvious concern in developing multi­lingual applications is the many different characters used in the world's written languages. Software applications must display, accept, and correctly interpret languages containing many different written symbols. These symbols, or glyphs, are each represented internally in a computer by a binary representation known as a code point. Code points are gathered together into character sets, and different character sets are defined to meet the needs of different languages.

Windows uses a set of characters known as a code page, also known as OEM character sets. By using a different code page, you can change the set of characters you can access. Code pages are referred to by number, such as 437 (standard US), 850 (Multinational), 865 (Nordic), 860 (French-Canadian). These code pages assign different characters to code points in the extended ASCII range (128 to 255), but are the same in the standard set from 0 to 127.

Using different code pages creates problems with the display and interpretation of data. Software using the binary representation of one character set may incorrectly interpret anything that is stored by software using a different character set. For example, in code page 850, the characters "Á", "Â", and "À" are represented as code points 181, 182, and 183. However, in code page 437, these code points are line drawing characters. Data created in code page 850 and displayed in code page 437 may be difficult to read and even meaningless.

One of the most common character sets, ASCII, is a Single­Byte Character Set (SBCS). One byte (eight bits) is used to store the characters. As each bit can have a value of one or zero, there are a total of 256 code points (28). These code points are often referred to by integer values from a to 255, based on the associated binary number. For example, 65 is the character "A", and 97 is the character "a". The standard ASCII set is the first 128 characters, a to 127. The extended ASCII set is the last 128 characters, 128 to 255. The standard set consists of those characters used frequently in the US ­the English alphabet and punctuation.