Frequency Analysis

Free letter frequency analysis tool. Useful for breaking classical ciphers and cryptograms, detecting language, etc. Just paste your text and click the Analyze button. See more instructions below the tool.

Frequency Analysis Tool

Options



Results

Code Count Frequency

Total Count:

Index of Coincidence (non-normalized):

Comparison

Code Frequency

Index of Coincidence (non-normalized):

How To Use It

This frequency analysis tool can analyze unigrams (single letters), bigrams (two-letters-groups, also called digraphs), trigrams (three-letter-groups, also called trigraphs), or longer.

Unigram analysis

  • Set N-gram size to 1.
  • If you are analyzing polyalphabetic substitution Ciphers (for example Vigenère), you can use different step sizes (representing different key lengths) and offsets.

Polygram analysis (bigram, trigram or higher)

  • Set N-gram size to the number of letters per group (2 for bigrams, 3 for trigrams, etc).
  • For digraph ciphers (Playfair, Bifid, Four-square, etc), the step size should be 2 and offset 0.
  • For the Trifid cipher, the step size should be 3 and offset 0.

Even for single-letter monoalphabetic substitution ciphers, a polygram analysis can be useful to detect common trigrams (like the). Set the step size to 1.

Options

Preserve Casing

This will make uppercase and lowercase letters differ. It should only be enabled for ciphers where the case matters, for instance the ROT47 cipher.

Keep Spaces & Non-Letters

This will keep any characters that are not letters.

Remove Accents

This will remove accents from letters. It will for example make the words Vigenère and Vigenere equal. This option is rarely used.

Step Size

This determines that number of positions between the starts of the codes. For example let's take bigrams from this text:

ABCDEFGHIJKLMNOPQR

With a step size of 1 we would get:

AB BC CD DE EF FG GH, etc

With a step size of 2 we would get:

AB CD EF GH IJ KL MN, etc

Offset

This determines at which position to take the first code. For example let's take bigrams with a step size of 2 from this text:

ABCDEFGHIJKLMNOPQR

With an offset of 0 we would get:

AB CD EF GH IJ KL MN, etc

With an offset of 1 we would get:

BC DE FG HI JK LM NO, etc

Index of Coincidence

The Index of Coincidence is a statistical measure that can help identify cipher type and language used. Texts written in a natural language (English, or other) usually have an index of coincidence that represents that language. If the letters are changed, as in a monoalphabetic substitution cipher, the index of coincidence remains the same. Also the same is true for transposition ciphers.

A non-normalized Index of Coincidence is used because the tool should be useful for any language. If you want the normalized index of coincidence, you should multiply with the number of letters in the alphabet of the language (26 for English, 27 for Spanish, etc).