LZW Compression Calculator

Calculator Form

Input Text

Initial Symbol Dictionary

Bit Estimation Method

Bits per Original Character

Fixed Code Width

Maximum Dictionary Size

Example Data Table

This sample uses a classic repeated text string. The numbers below illustrate the kind of output the calculator reports after encoding.

Sample Text	Symbol Mode	Bits per Character	Code Width	Max Dictionary	Typical Output Codes	What to Observe
TOBEORNOTTOBEORTOBEORNOT	ASCII 256	8	12	4096	84, 79, 66, 69, 79, 82, 78, 79, 84, 256, 258, 260, 265, 259, 261, 263	Repeated patterns create new dictionary entries and reduce average bits per symbol.
ABABABAABABA	Unique symbols from input	8	12	512	0, 1, 2, 4, 5, 1	Short alphabets often show dictionary growth very clearly in the trace table.

Formula Used

LZW compression works by replacing repeated symbol sequences with dictionary codes. The algorithm starts with an initial symbol dictionary, emits a code for the longest known phrase, then adds a new phrase formed by the current phrase plus the next symbol.

The calculator reports size estimates with these formulas:

Original Bits = Number of input units × Bits per original character
Compressed Bits = Sum of code widths for all emitted codes
Compression Ratio = Original Bits ÷ Compressed Bits
Space Saving (%) = ((Original Bits − Compressed Bits) ÷ Original Bits) × 100
Average Bits per Unit = Compressed Bits ÷ Number of input units
Dictionary Saturation (%) = (Final Dictionary Size ÷ Maximum Dictionary Size) × 100

When Fixed code width is selected, every emitted code uses the chosen width. When Dynamic code width is selected, the estimator increases code width as the dictionary grows, starting from the minimum width required by the initial dictionary.

How to Use This Calculator

Enter the text string you want to analyze.
Choose whether the starting dictionary uses ASCII symbols or unique symbols from the input.
Set the original bits per character, such as 8 for standard text storage.
Select a fixed or dynamic code width estimation method.
Enter the fixed code width if you want every output code measured with the same width.
Set the maximum dictionary size limit for the LZW run.
Press Calculate Compression to show results above the form.
Review the summary cards, output codes, and encoding trace.
Use the CSV or PDF buttons to export the summary and trace tables.

FAQs

1. What does this calculator measure?

It estimates LZW encoding output, compressed size, savings, compression ratio, dictionary growth, and a step-by-step trace of emitted codes and added phrases.

2. Why can compression sometimes become worse?

Very short or random text may not repeat enough patterns. In those cases, emitted codes and dictionary overhead can make the encoded size larger than the original.

3. What is the difference between the two symbol modes?

ASCII mode starts with a full 256-symbol dictionary. Unique input symbol mode builds the starting dictionary only from symbols found in the entered text.

4. What does dynamic code width mean here?

Dynamic width increases the estimated bits per emitted code as the dictionary grows. It usually reflects real implementations better than a single fixed width.

5. Why is decode verification included?

It checks whether the produced code stream can decode back to the exact original input under the same selected settings and dictionary rules.

6. Can I use this with spaces and line breaks?

Yes. Spaces, tabs, and line breaks are accepted. The trace table displays them with readable markers like ␠ and \n.

7. What does dictionary saturation tell me?

It shows how close the final dictionary came to the selected maximum size. High saturation means the dictionary limit strongly influenced the encoding path.

8. Is this suitable for teaching and auditing?

Yes. The trace lists each phrase, next symbol, output code, width estimate, and dictionary update, which makes manual checking much easier.