Character set and ASCII Table
Character Set:
A character set, often referred to as a character encoding or character repertoire, is a defined collection of characters, symbols, and control codes used to represent textual information in a computer or communication system. It specifies how characters are mapped to numerical values (usually binary) so that computers can store, transmit, and display text.
ASCII Table:
The ASCII (American Standard Code for Information Interchange) table is one of the most well-known character sets and encoding standards. ASCII was developed in the early days of computing to provide a standardized way to represent characters in computers and communication equipment. The ASCII table assigns numerical values (integers) to a set of 128 characters, including letters, digits, punctuation marks, control codes, and special symbols. Each character is represented by a 7-bit binary code (0 to 127).
Here's an illustration of the ASCII table in detail, organized into several categories:
-
Control Characters (0-31):
-
These characters are non-printable and used for control purposes, such as carriage return, line feed, and escape sequences.
-
Examples: NUL (Null), CR (Carriage Return), LF (Line Feed), TAB (Tabulation), ESC (Escape).
-
Basic Latin (32-127):
-
This range includes common printable characters, including uppercase and lowercase letters, digits, punctuation marks, and some special symbols.
-
Examples: 'A' to 'Z', 'a' to 'z', '0' to '9', '!', '@', '#', '$', '%', etc.
-
Extended ASCII (128-255):
-
Extended ASCII characters vary depending on specific implementations and locales.
-
In the standard ASCII table, these codes are not defined consistently. However, some common extended ASCII sets include characters for different languages, symbols, and graphics.
-
Examples: Accented letters, currency symbols, box-drawing characters, etc.
-
Control Characters (128-255):
-
Some control codes and special characters are also found in the extended ASCII range.
-
Examples: Latin-1 Supplement characters, such as © (Copyright), ® (Registered Trademark), and others.
Here's a simplified representation of the ASCII table:
----------------------------------------------------------------
| 0- 31 | Control Characters | 32- 63 | Punctuation |
|----------------------------------------------------------------|
| 32- 63 | Basic Latin (printable) | 64- 95 | Punctuation |
|----------------------------------------------------------------|
| 64- 95 | Basic Latin (printable) | 96-127 | Punctuation |
|----------------------------------------------------------------|
| 128-159 | Extended ASCII (varies) | 160-191 | Extended |
|----------------------------------------------------------------|
| 160-191 | Extended | 192-223 | Extended |
|----------------------------------------------------------------|
| 192-223 | Extended | 224-255 | Extended |
|----------------------------------------------------------------|
Keep in mind that ASCII is a 7-bit character encoding and has limited support for characters from languages other than English. For broader language support, other character encodings like UTF-8 (Unicode Transformation Format) are commonly used, as they can represent a much wider range of characters from various languages and scripts. UTF-8 is backward-compatible with ASCII, making it a popular choice for modern text encoding.
Here's the full ASCII table, including control characters, basic Latin characters, and extended ASCII characters:
ASCII Table
Dec | Char | Dec | Char | Dec | Char | Dec | Char |
0 | NUL | 32 | ( ) | 64 | @ | 96 | ` |
1 | SOH | 33 | ! | 65 | A | 97 | a |
2 | STX | 34 | " | 66 | B | 98 | b |
3 | ETX | 35 | # | 67 | C | 99 | c |
4 | EOT | 36 | $ | 68 | D | 100 | d |
5 | ENQ | 37 | % | 69 | E | 101 | e |
6 | ACK | 38 | & | 70 | F | 102 | f |
7 | BEL | 39 | ' | 71 | G | 103 | g |
8 | BS | 40 | ( | 72 | H | 104 | h |
9 | TAB | 41 | ) | 73 | I | 105 | i |
10 | LF | 42 | * | 74 | J | 106 | j |
11 | VT | 43 | + | 75 | K | 107 | k |
12 | FF | 44 | , | 76 | L | 108 | l |
13 | CR | 45 | - | 77 | M | 109 | m |
14 | SO | 46 | . | 78 | N | 110 | n |
15 | SI | 47 | / | 79 | O | 111 | o |
16 | DLE | 48 | 0 | 80 | P | 112 | p |
17 | DC1 | 49 | 1 | 81 | Q | 113 | q |
18 | DC2 | 50 | 2 | 82 | R | 114 | r |
19 | DC3 | 51 | 3 | 83 | S | 115 | s |
20 | DC4 | 52 | 4 | 84 | T | 116 | t |
21 | NAK | 53 | 5 | 85 | U | 117 | u |
22 | SYN | 54 | 6 | 86 | V | 118 | v |
23 | ETB | 55 | 7 | 87 | W | 119 | w |
24 | CAN | 56 | 8 | 88 | X | 120 | x |
25 | EM | 57 | 9 | 89 | Y | 121 | y |
26 | SUB | 58 | : | 90 | Z | 122 | z |
27 | ESC | 59 | ; | 91 | [ | 123 | { |
28 | FS | 60 | < | 92 | \ | 124 | | |
29 | GS | 61 | = | 93 | ] | 125 | } |
30 | RS | 62 | > | 94 | ^ | 126 | ~ |
31 | US | 63 | ? | 95 | _ | 127 | DEL |
In this table, "Dec" represents the decimal value of the character, and "Char" represents the character itself. Control characters are non-printable and are used for various control and formatting purposes, while the printable characters include letters, digits, punctuation, and symbols. The extended ASCII characters (128-255) can vary depending on specific implementations and locales.