Character Frequency Analyzer
Free online character frequency analyzer. Paste any text to see letter frequency, word frequency, bigrams, and character statistics. Useful for cryptanalysis, writing analysis, and linguistics. No signup.
How to Use the Character Frequency Analyzer
Paste any text into the editor on the left. Four analysis tabs update instantly: Characters shows every character and its count, Letters compares your text against a language frequency baseline, Words lists the most common vocabulary with stopword filtering, and N-grams shows character sequence frequencies for cryptanalysis. Use the Export button to download results as JSON, CSV, or a plain text report.
About This Tool
A comprehensive text frequency analysis tool for writers, linguists, and cryptanalysis students. Analyzes character frequency, letter frequency with comparison against English, German, French, Spanish, and Icelandic baselines, word frequency with stopword filtering and vocabulary richness metrics (Type-Token Ratio, hapax legomena), and character-level n-grams (bigrams, trigrams, quadgrams) with a visual heatmap. Computes the Index of Coincidence for cipher identification and chi-squared distance for language detection. All analysis runs in pure JavaScript with zero external libraries. Pair with the Word Counter for basic counting or the Reading Time Estimator for readability analysis.
Quick Reference Table
| Metric | Description |
|---|---|
| Index of Coincidence | Measures letter distribution evenness — ~0.067 for English, ~0.038 for random |
| Chi-Squared (χ²) | Distance from expected language baseline — lower is closer match |
| Type-Token Ratio | Unique words / total words — higher means richer vocabulary |
| Hapax Legomena | Words appearing exactly once — indicates vocabulary diversity |
| Bigram | Two consecutive characters — th, he, in are top English bigrams |
| Trigram | Three consecutive characters — the, and, ing are top English trigrams |
Frequently Asked Questions
What is a bigram?
A bigram is any sequence of two consecutive characters. In the text ‘hello’, the bigrams are ‘he’, ‘el’, ‘ll’, ‘lo’. Character bigrams are used in cryptanalysis, language detection, spell checking, and language models. The most common English bigrams are ‘th’, ‘he’, ‘in’, ‘er’, and ‘an’.
What does Type-Token Ratio measure?
Type-Token Ratio (TTR) is the ratio of unique words (types) to total words (tokens). A text with TTR 1.0 has no repeated words. Academic and literary texts typically have higher TTR than news writing or casual speech.
Why does case-insensitive mode combine upper and lower?
In frequency analysis, ‘E’ and ‘e’ are the same letter — they carry the same linguistic information. Case-insensitive mode merges them into a single count, which makes frequency analysis results more meaningful.
How does the chi-squared language detection work?
Chi-squared distance measures how far the observed letter frequencies deviate from the expected frequencies for a given language. Lower scores indicate a closer match. Running this against multiple language baselines and ranking the results provides quick language identification.