Jump to content

Using CSV or TDV format for representing multi-level nested data structures


Recommended Posts

Posted

Please have a look at my proposal to use CSV as lightweight data-interchange format compared to JSON and XML.

 

Multi-level nested CSV TDV.pdf

(or)

http://siara.cc/csv_ml

 

Screenshot of reference implementation:

post-112853-0-98477300-1437640171_thumb.png

 

The demo application (Java) can be download from: http://siara.cc/csv_ml/csv_ml_swing_demo-1.0.0.jar

 

The proposed format is expected to:

  • save storage space (about 50% compared to JSON and 60-70% compared to XML)
  • increase data transfer speeds
  • be faster to parse compared to XML and JSON
  • allow full schema definition and validation
  • make schema definition simple, lightweight and in-line compared to DTD or XML Schema
  • allow database binding
  • be used in EAI (Application Integration) for import and export of data
  • be simpler to parse, allowing data to be available even in low memory devices

The given demos convert between CSV, XML and JSON (CSV to XML DOM, CSV to JSON, XML to CSV).

 

Github home page: https://github.com/siara-cc/csv_ml

Posted (edited)

You could save even more storage space, if you would introduce banks of available answers for some fields.

f.e. assign chemistry=1,physics=2,mathematics=3 etc.

and then use 1,2,3 instead of full-text versions.

 

BTW, XML is highly extendable. One can write loader/saver which ignores unknown tags and attributes (from older/newer version of software) and there is high chance it'll work.

Your's solution, won't. It's very limited, not extendable.

Edited by Sensei
Posted

Hi @Sensei,

 

Thank you.. I think your first idea is about having a dictionary for some fields and using index positions to refer to it.. It is an excellent suggestion and I want to add it. Where the possible values are fixed, such as in a LOV (List of Values), it will save even more space.

 

While I agree with your second opinion, this is the way I look at it: XML is too open to suit my purpose. I am not trying to replace XML for all the purposes it is being used for, but only in case of representing relational data, where schema is more or less fixed and there is huge amount of data involved for storage and transfer, as in three-tier architecture and rdbms.

 

If this is adopted for new designs this could tremendously avoid redundant data being transferred and processed. I think this would make a positive impact on the amount of energy being used (in terms of watts).

 

But I don't understand where my solution is not extendable. Can give an example?

 

Again thanks for the valuable feedback.

 

Regards

Arun

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.