-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Summary
When toonifying nested JSON structures, certain properties (such as arrays of objects or strings) are lost in the output, even though token counts are reduced. This results in incomplete toonified data.
Steps to Reproduce
-
Use the following JSON input (inline below).
The data includes a nested array of objects underoffsetand an array of strings underhierarchy.{ "categorization": [ { "id": "01.04.04.01.", "label": "Aspetti generali", "hierarchy": [ "Prodotti", "Organizzazione altro e Sito Internet", "Aspetti generali", "Aspetti generali" ], "score": 900, "winner": true, "namespace": "$namespace", "frequency": 0, "offset": [ { "start": 511, "end": 520 }, { "start": 524, "end": 527 }, { "start": 528, "end": 543 } ] } ] } -
Run toonify on this JSON.
-
Inspect the toonified output.
Observed Behavior
- The resulting toonified JSON omits both the
offset(array of objects) andhierarchy(array of strings) properties. - Token count is indeed reduced, but this reduction comes from the loss of meaningful structure and data.
- See attached screenshot for reference:

Expected Behavior
Toonified output should preserve all properties (including nested arrays and objects), ensuring structural and semantic fidelity while still optimizing token usage.
Environment
- package version:
toonify-1.4.0 - IDE: PyCharm (Notebook mode)
- Experiment Context: Quick local test to measure token impact
Additional Context
I understand that this type of JSON doesn't fit neatly into traditional tabular data structures. However, supporting more complex, nested formats would significantly improve toonify's robustness, especially for real-world datasets used with LLMs where hierarchical or relational structures are common.
Suggested Improvement
Consider extending toonify's serialization logic to:
- Preserve nested arrays and object properties.
- Optionally flatten or represent them symbolically (e.g.,
offset[start:end]shorthand) without dropping information. - Provide a fallback or warning when certain structures can't be safely toonified.