Calculator Inputs
Example Data Table
| Category | Display Order | Assigned Index | Binary Code | Use Case |
|---|---|---|---|---|
| cat | 1 | 0 | 000 | Pet class encoding |
| dog | 2 | 1 | 001 | Pet class encoding |
| fox | 3 | 2 | 010 | Wildlife label encoding |
| horse | 4 | 3 | 011 | Animal taxonomy task |
| owl | 5 | 4 | 100 | Night species label |
Formula Used
Assigned Index = Category Position Offset + Index Origin
Bits = max(1, ceil(log2(Max Index + 1)))
Binary Code = Left Pad(Binary(Assigned Index), Bits, 0)
Capacity = 2Bits
Binary encoding assigns each category a decimal index, then converts that index into a fixed-length binary string. This reduces dimensionality compared with one-hot encoding, while still creating machine-readable categorical inputs.
How To Use This Calculator
- Enter category labels, one per line or separated by commas.
- Choose whether to preserve the original order or sort labels alphabetically.
- Set index origin to zero-based or one-based encoding.
- Select automatic or custom bit length.
- Add values you want to encode in batch form.
- Optionally enter binary strings to decode back into categories.
- Choose separator style, duplicate handling, and unknown category behavior.
- Press Submit to display the results above the form.
- Use the CSV and PDF buttons to export the generated tables.
Frequently Asked Questions
1. What does binary encoding do in machine learning?
Binary encoding converts each category into a compact binary code. It usually reduces feature width compared with one-hot encoding while still preserving a structured numeric representation for models.
2. Why use binary encoding instead of one-hot encoding?
Binary encoding can use fewer columns when a feature has many categories. That lowers memory usage and may speed up training, especially for wide datasets.
3. How is the bit length calculated?
The calculator finds the maximum assigned index, then computes the smallest number of bits that can represent it. That value is ceil(log2(max index + 1)).
4. What happens when a new category appears later?
A new unseen category may require a fallback rule. You can mark it as an error, return zeros, or skip it, depending on your downstream data policy.
5. Does category order affect the binary codes?
Yes. Category order controls the assigned decimal index, so changing the order changes the binary output. Keep the same mapping during training and inference.
6. Can I decode the binary values back to labels?
Yes. This calculator accepts binary strings and converts them back to decimal indices, then checks whether a mapped category exists for that index.
7. Is binary encoding always better for every model?
No. Performance depends on the model and dataset. Tree models, linear models, and neural networks can react differently, so compare encoders during validation.
8. What should I export after testing mappings?
Export the mapping table and encoded results. Those records help you reproduce the same category-to-code relationship across preprocessing, validation, and production scoring.