Using CSV or TDV format for representing multi-level nested data structures

siara · July 23, 2015

Please have a look at my proposal to use CSV as lightweight data-interchange format compared to JSON and XML.

Multi-level nested CSV TDV.pdf

(or)

http://siara.cc/csv_ml

Screenshot of reference implementation:

The demo application (Java) can be download from: http://siara.cc/csv_ml/csv_ml_swing_demo-1.0.0.jar

The proposed format is expected to:

save storage space (about 50% compared to JSON and 60-70% compared to XML)
increase data transfer speeds
be faster to parse compared to XML and JSON
allow full schema definition and validation
make schema definition simple, lightweight and in-line compared to DTD or XML Schema
allow database binding
be used in EAI (Application Integration) for import and export of data
be simpler to parse, allowing data to be available even in low memory devices

The given demos convert between CSV, XML and JSON (CSV to XML DOM, CSV to JSON, XML to CSV).

Github home page: https://github.com/siara-cc/csv_ml

Sensei · July 23, 2015

You could save even more storage space, if you would introduce banks of available answers for some fields.

f.e. assign chemistry=1,physics=2,mathematics=3 etc.

and then use 1,2,3 instead of full-text versions.

BTW, XML is highly extendable. One can write loader/saver which ignores unknown tags and attributes (from older/newer version of software) and there is high chance it'll work.

Your's solution, won't. It's very limited, not extendable.

Edited July 23, 2015 by Sensei

siara · July 24, 2015

Hi @Sensei,

Thank you.. I think your first idea is about having a dictionary for some fields and using index positions to refer to it.. It is an excellent suggestion and I want to add it. Where the possible values are fixed, such as in a LOV (List of Values), it will save even more space.

While I agree with your second opinion, this is the way I look at it: XML is too open to suit my purpose. I am not trying to replace XML for all the purposes it is being used for, but only in case of representing relational data, where schema is more or less fixed and there is huge amount of data involved for storage and transfer, as in three-tier architecture and rdbms.

If this is adopted for new designs this could tremendously avoid redundant data being transferred and processed. I think this would make a positive impact on the amount of energy being used (in terms of watts).

But I don't understand where my solution is not extendable. Can give an example?

Again thanks for the valuable feedback.

Regards

Arun

Sign In

Using CSV or TDV format for representing multi-level nested data structures

Recommended Posts

siara

Sensei

siara

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information