Code-Breaking: Encodings

Unicode

While ASCII is still commonly used, it is very limited in the number of characters is can represent. To be able to cover non-English languages (including Chinese) and symbols, a new standard, Unicode, evolved. Currently Unicode has over 137000 different characters. Although the character codes are standardized, there are different encoding standards (UTF8, UTF7, UTF16, etc) for representing the Unicode characters.

Base64

Base64 is a set of similar encodings that represent binary data through ASCII strings. Each Base64 character represents 6 bits of data, and therefore four Base64 characters represent three bytes of data. Base64 can often be recognized by the padding. Often you see one or more = at the end. This is because the binary data encoded not always has a length of a multiple of three.

Baudot codes

The Baudot code was invented by Émile Baudot before ASCII was invented. Each character in Baudot is represented by 5 bits and sent over a communication channel such as a telegraph wire or a radio signal. The term Baud, which is used for expressing the transmission speed, is derived from the Baudot name.

Hashes

A hash function maps data from any size to a fixed size "hash". The hashes are also refered to as hash values, hash codes or digests. They are commonly used in cryptography to assure integrity of data or protecting passwords or other sensitive data. Some of the most common cryptographic hash functions are SHA-1, SHA-2 and MD5 (which is no longer considered secure).

Burrows–Wheeler transform

The Burrows–Wheeler transform (BWT) rearranges letters into runs of similar characters. It is reversible as long as you store the position of the first original character. The repeating characters is especially useful for compression, which can take advantage of the repeating characters. It is a method of increasing the efficiency of text compression, costing only a little extra computation.

QR codes and barcodes

Many different standards exist for barcodes. Depending on the standard used, they can represent characters, numbers or the whole range of ASCII characters. QR codes is a form of two-dimensional barcodes that has become popular for storing addresses (URLs) and appear in advertising, in magazines, on signs, etc. There are smartphone apps for scanning QR and bardcodes. Also they can be read by online services.