ESL Documentation

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Print Topic
Send Mail Feedback
Share This Topic
Save Permalink URL

Navigation: ESL Documentation > ESL Internationalisation

Character Sets & Code Pages

One significant and obvious concern in developing multilingual applications is the many different characters used in the world's written languages. Software applications must display, accept, and correctly interpret languages containing many different written symbols. These symbols, or glyphs, are each represented internally in a computer by a binary representation known as a code point. Code points are gathered together into character sets, and different character sets are defined to meet the needs of different languages.

Windows uses a set of characters known as a code page, also known as OEM character sets. By using a different code page, you can change the set of characters you can access. Code pages are referred to by number, such as 437 (standard US), 850 (Multinational), 865 (Nordic), 860 (French-Canadian). These code pages assign different characters to code points in the extended ASCII range (128 to 255), but are the same in the standard set from 0 to 127.

Using different code pages creates problems with the display and interpretation of data. Software using the binary representation of one character set may incorrectly interpret anything that is stored by software using a different character set. For example, in code page 850, the characters "Á", "Â", and "À" are represented as code points 181, 182, and 183. However, in code page 437, these code points are line drawing characters. Data created in code page 850 and displayed in code page 437 may be difficult to read and even meaningless.

One of the most common character sets, ASCII, is a SingleByte Character Set (SBCS). One byte (eight bits) is used to store the characters. As each bit can have a value of one or zero, there are a total of 256 code points (28). These code points are often referred to by integer values from a to 255, based on the associated binary number. For example, 65 is the character "A", and 97 is the character "a". The standard ASCII set is the first 128 characters, a to 127. The extended ASCII set is the last 128 characters, 128 to 255. The standard set consists of those characters used frequently in the US the English alphabet and punctuation.

Please enable JavaScript to view this site.

ESL Documentation

Character Sets & Code Pages