Character Frequency Analyzer

Paste text to see letter frequency, word frequency, bigrams, and character statistics. Useful for cryptanalysis, writing analysis, and linguistics.

Jump to section

Ad · responsive

How to Use the Character Frequency Analyzer

Paste any text into the editor on the left. Four analysis tabs update instantly: Characters shows every character and its count, Letters compares your text against a language frequency baseline, Words lists the most common vocabulary with stopword filtering, and N-grams shows character sequence frequencies for cryptanalysis. Use the Export button to download results as JSON, CSV, or a plain text report.

About This Tool

A comprehensive text frequency analysis tool for writers, linguists, and cryptanalysis students. Analyzes character frequency, letter frequency with comparison against English, German, French, Spanish, and Icelandic baselines, word frequency with stopword filtering and vocabulary richness metrics (Type-Token Ratio, hapax legomena), and character-level n-grams (bigrams, trigrams, quadgrams) with a visual heatmap. Computes the Index of Coincidence for cipher identification and chi-squared distance for language detection. All analysis runs in pure JavaScript with zero external libraries. Pair with the Word Counter for basic counting or the Reading Time Estimator for readability analysis.

Quick Reference Table

Metric	Description
Index of Coincidence	Measures letter distribution evenness — ~0.067 for English, ~0.038 for random
Chi-Squared (χ²)	Distance from expected language baseline — lower is closer match
Type-Token Ratio	Unique words / total words — higher means richer vocabulary
Hapax Legomena	Words appearing exactly once — indicates vocabulary diversity
Bigram	Two consecutive characters — th, he, in are top English bigrams
Trigram	Three consecutive characters — the, and, ing are top English trigrams

Frequently Asked Questions

What is a bigram?

A bigram is any sequence of two consecutive characters. In the text ‘hello’, the bigrams are ‘he’, ‘el’, ‘ll’, ‘lo’. Character bigrams are used in cryptanalysis, language detection, spell checking, and language models. The most common English bigrams are ‘th’, ‘he’, ‘in’, ‘er’, and ‘an’.

What does Type-Token Ratio measure?

Type-Token Ratio (TTR) is the ratio of unique words (types) to total words (tokens). A text with TTR 1.0 has no repeated words. Academic and literary texts typically have higher TTR than news writing or casual speech.

Why does case-insensitive mode combine upper and lower?

In frequency analysis, ‘E’ and ‘e’ are the same letter — they carry the same linguistic information. Case-insensitive mode merges them into a single count, which makes frequency analysis results more meaningful.

How does the chi-squared language detection work?

Chi-squared distance measures how far the observed letter frequencies deviate from the expected frequencies for a given language. Lower scores indicate a closer match. Running this against multiple language baselines and ranking the results provides quick language identification.

Is my text sent to a server?

No. All frequency analysis runs entirely in your browser using JavaScript. Your text never leaves your device.

Created by The Toobits Team · Engineering & Editorial

Toobits is built, tested, and maintained by a small independent engineering team. Every tool is written in TypeScript, runs entirely in the browser, and is reviewed against its source formulas before publication.

Editorial policy · Updated April 2026

Ad · responsive