Monday, January 5, 2026

History of IT: ASCII and Why It's Still Used

 I outlined in a previous post how we got to an 8 bit byte. In that post, I mentioned the wonderful 7 bit ASCII. With ASCII being a 7 bit encoding and bytes having 8 bits, have you ever asked why it's stuck around? Let's dig into that.



In the late 1950s, incompatible encodings was a problem for systems IBM had EBCDID, others had proprietary 6 or 7 bit schemes. This makes data exchange between systems difficult to impossible because other systems would not know how to read other encoding methods without some understanding or conversion.


In 1962, ASCII (American Standard Code for Information Interchange) is standardized. It was developed by ANSI (American National Standards Institute) and consisted of a 7 bit character encoding with 128 possible values. With early hardware limitations, it fit into most hardware and left and 8 bit unused which could be used for parity.


ASCII itself has 33 control characters (0-31, and 127)with 95 printable characters for upper case, lowercase, numerals, and punctuation. Due to some interpretive discrepancies, newlines can be interpreted differently (even on modern systems), Linux using the line feed control character, Apple using the carriage return control character, and Windows using both together. Some applications are able to understand all three possibilities and you may not even see a difference.


In the late 1960s and 1970s, ASCII becomes popularized for its simplicity. Being a dominant encoding for UNIX, ARPANET (RFC 20), and early networking protocols, it began to take over. IBM kept EBCDIC, but ASCII dominated inter-system communications, which is what you want for a system designed for portability, and portability is what helped it win.


In the 1980s and 1990s, we got into "extended ASCII." Vendors repurposed the last unused 8th bit to extend it into 256 possible characters. The problem that popped up, again, was no uniform standard (ISO8859-1, Windows-1252, Mac Roman). Now we have a portability problem again.


In 1991-1993, a wild Unicode appears! Unicode attempts to do two things. The first is to solve a global character representation, because certain languages just can't be written out normally in ASCII. The adoption of Unicode is slow, but it includes ASCII, rather than replacing it.


By the late 1990s, UTF-8 encoding is designed. It was backwards compatible with ASCII, allowing that good old portability (RFC 3629). ASCII characters remain 1 byte, higher characters expand to multiple bytes.


So with UTF-8, why is ASCII still used? Protocols like HTTP (headers), SMTP commands, FTP commands, and even DNS labels (for the most part) assume ASCII. Some programming languages depend on it as well for source code keywords, identifiers, syntax tokens, and even when Unicode is supported a parser may still assume ASCII control structures.


To answer the question of why ASCII is still used: It was simple, it was portable, plenty of things rely on it, and the encodings that incorporated it also succeeded. It proves that it's not the "best" that succeeds but what actually accomplishes what it set out to do in a way that works for as many as possible. It hit all the key points it needed to and still lives on.


No comments:

Post a Comment