Earlier computers were used primarily for numerical calculations. Modern digital computers, however, must represent and process non-numeric information such as names, addresses, descriptions and punctuation. Such information is called alphanumeric data and is represented inside a computer as sequences of binary digits (0 and 1). The mapping between characters (letters, digits, punctuation and control symbols) and binary patterns is provided by alphanumeric codes, also called character codes. These codes make it possible to interface input/output devices such as keyboards, displays and printers with the internal binary representation used by a computer.
Many coding techniques have been developed to represent characters as sequences of bits. Early well-known codes include Morse code and the Hollerith punched-card code. Today the most widely used schemes are ASCII and EBCDIC, and a more recent universal scheme is Unicode. Each code has its historical context, advantages and limitations.
The Morse code was invented in 1837 by Samuel F. B. Morse for telegraphy. It represents letters, numerals and some punctuation by standardized sequences of short and long signals (commonly called dots and dashes). A dash is treated as the time duration of three dots; letters are separated by short gaps and words by longer gaps.
For example, the letter A is represented by dot-dash and the digit 5 by five dots in succession. Because Morse characters have variable lengths, the code is not directly suitable for automated fixed-length digital circuits and is largely superseded in electronic systems by fixed-length binary codes. Morse code is still used in specialised telegraph and radio applications and retains limited use in aviation and amateur radio.
The Baudot code was developed by the French engineer Émile Baudot around 1870. It is a fixed-length 5-unit code (five bits) in which every symbol has equal duration. Baudot allowed efficient teleprinter transmission of letters, numerals and some control functions by using character-shift mechanisms (for example, separate letter and figure states).
Herman Hollerith developed punched-card tabulating equipment in the late 19th century; the punched cards and their coding are commonly referred to as Hollerith cards. A Hollerith card used 12 possible punch rows (often numbered 12, 11, 0, 1, 2, ..., 9) and combinations of punches in those rows encoded characters and numeric fields. Hollerith encoding uses combinations of punches in 12 rows; this can be mapped to a 12-bit binary pattern for modern representation, but it was not originally a binary code.
When Hollerith data were printed or transferred to character devices, various mapping conventions were used. One such convention maps the 12 bits into two 6-bit groups (each sometimes represented inside an ASCII or other character with added control bits) so the 12-bit Hollerith pattern could be carried through systems expecting smaller character widths. With the disappearance of punched cards, Hollerith encoding is now mostly of historical interest.
The American Standard Code for Information Interchange (ASCII, pronounced "as-kee") is a character encoding that originated from telegraphic codes and was standardised in the 1960s. The original standard defines a 7-bit code, allowing 128 distinct code points. These include 95 printable characters (26 uppercase letters A-Z, 26 lowercase letters a-z, 10 digits 0-9 and various punctuation and symbols) and 33 control characters (for device control and text formatting).
The 7-bit ASCII code assigns to each character a pattern X6 X5 X4 X3 X2 X1 X0 where each X is 0 or 1. For human readability the bits are often grouped when shown; for example the letter D is represented as 1000100 and the letter A has X6 X5 X4 = 100 and X3 X2 X1 X0 = 0001, giving 1000001 in 7-bit form. The digit 9 has X6 X5 X4 = 011 and X3 X2 X1 X0 = 1001, giving 0111001.
An 8-bit form of ASCII (often called ASCII-8 or extended ASCII) became common because most computers organise data in 8-bit bytes. In many systems the eighth bit was used as a parity bit for simple error detection on communication lines or set to 0 when parity was unused. When the eighth bit is used only as parity, the 7-bit ASCII pattern still determines the character value and the eighth bit is used for error checking.
Example 1: With an ASCII-7 keyboard, each keystroke produces the ASCII equivalent of the designated character. Suppose that you type PRINT X. What is the output of an ASCII-7 keyboard?
Solution:
The ASCII-7 equivalent of P = 1010000
The ASCII-7 equivalent of R = 1010010
The ASCII-7 equivalent of I = 1001001
The ASCII-7 equivalent of N = 1001110
The ASCII-7 equivalent of T = 1010100
The ASCII-7 equivalent of space = 0100000
The ASCII-7 equivalent of X = 1011000
Concatenating these bit patterns yields the output binary string 10100001010010100100101001110101010001000001011000. The hexadecimal byte values (ASCII-8 with MSB = 0) are 50 52 49 4E 54 20 58.
Example 2: A computer sends a message to another computer using an odd-parity bit. Here is the message in ASCII-8 code. Assuming MSB is parity and remaining 7 bits are ASCII.
10110001
10110101
10100101
10100101
10101110
Solution
Translating the 8-bit numbers to ASCII characters gives the sequence Q U E E N.
Therefore the received message decodes to QUEEN.
The Extended Binary Coded Decimal Interchange Code (EBCDIC, pronounced "ee-bee-see-deck") is an 8-bit character encoding developed by IBM. Numerals 0-9 in EBCDIC are related to the 8421 BCD representation preceded by a prefix pattern in the high bits; the full 8-bit space allows up to 256 distinct code points.
EBCDIC does not use a simple sequential binary order for letters and symbols as ASCII does. Instead characters are arranged in blocks in the 8-bit space. When reading tables it is common to separate the eight bits into two groups of four (high nibble and low nibble) and represent each nibble in hexadecimal, so each EBCDIC code can be expressed as two hexadecimal digits. For example, in a common EBCDIC table the letter A is represented by the 8-bit pattern 11000001 and B by 11000010. Other characters such as '=' and '′havetheirownEBCDICcodes(forexample′=′→01111110inthetableshown,′' → 01011011).
EBCDIC remains in use on many IBM mainframe and midrange systems; ASCII-based encodings dominate personal computers and most modern systems, but conversion utilities are commonly used when data pass between these platforms.
Both ASCII and EBCDIC (and many vendor-specific extensions) suffer two important limitations:
To overcome these limitations the Unicode standard was developed by the Unicode Consortium in cooperation with the International Organization for Standardization (ISO). Unicode defines a universal set of character code points and assigns each a unique number called a code point. Early Unicode was designed around a 16-bit space (allowing 65,536 code points), but the standard has since been extended to cover over a million code points to accommodate all modern and historic scripts, musical notation, emoji and many technical symbols.
To represent Unicode code points in storage and transmission, a number of encodings exist:
Unicode has a number of practical uses:
Major software vendors and standards bodies have adopted Unicode; it enables reliable multilingual text interchange and simplifies internationalisation of applications.
Concluding remark: Choosing a character encoding depends on the application context-legacy systems, storage efficiency, backwards compatibility with ASCII, or the need to support many languages. Understanding ASCII, EBCDIC and Unicode and how they are encoded (7-bit, 8-bit, variable-length) is essential for reliable text processing and data interchange.
| 1. What is alphanumeric code? | ![]() |
| 2. How is an alphanumeric code different from a numeric code? | ![]() |
| 3. Are alphanumeric codes used in everyday life? | ![]() |
| 4. How are alphanumeric codes used in computer programming? | ![]() |
| 5. Can alphanumeric codes be encrypted for security purposes? | ![]() |
![]() | Explore Courses for Electrical Engineering (EE) exam |
![]() | Get EduRev Notes directly in your Google search |