Unicode is the worldwide standard for encoding and representing text in most of the world's writing languages, maintained by the Unicode Consortium. Currently Unicode has over 137000 different characters, covering both modern and historical languages as well as symbols and emojis. The characters of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.
- Unicode is implemented using several different character encodings. The standard encodings include UTF-8, UTF-16, UTF-32 (also known as UCS-4), UTF-7 and UCS-2 (obsolete).
- The Unicode Consortium is responsible for maintaining and publishing the Unicode standard.
- The first 256 characters of Unicode are equivalent to the ISO-8859-1 standard. Also the first 128 characters are equivalent to the standard ASCII alphabet.
- Wikipedia has further info about Unicode and the various Unicode encodings.
Decode or encode Unicode UTF-8 format
This tool converts between Unicode and hexadecimal format using UTF-8 encoding. UTF-8 is the most common Unicode encoding and used by a majority of applications and websites.
Note that you can type in either the hex data or plaintext area.
Unicode encoding is regularly used in CTFs and logic puzzles. It can sometimes be recognized by BOM (byte order marks) in the beginning. UTF-8 can start with code EF BB BF, but it is not required or even recommended by the Unicode standard. UTF-16 can start with FE FF or FF FE, to indicate which form of UTF-16 is used. UTF-32 can start with 00 00 FE FF or FF FE 00 00.
Visual tricks can be played with unicode, such as upside down text effects.
f0 9f 99 88 f0 9f 99 89 f0 9f 99 89
The codes above represents three monkeys 🙈🙉🙉 encoded using Unicode UTF-8.