We all know about CSV files, and they can be easily converted to JSON for use in JS, but what about ingesting the results of a report, or a table that’s multi-dimensional and intended for human, rather than machine, parsing?
This is a first draft and is incomplete.
I had this problem: the lightly cooked data was in a spreadsheet. It looked like a table, but it was actually around a dozen small tables all stacked, with headings between the tables.
Having the author of this table re-do it so it was like a normalized table was out of the question – normalizing would require producing numerous spreadsheets.
Even worse, it was a 4-dimensional dataset. (More on this later.)
This was a living document – I had gotten my second draft of this – so I needed a way to ingest this data without disturbing the original.
The first time around, I reshaped the data and pasted it into a CSV to JSON tool that produced the objects I wanted.
The second time around, they added more data!
Merging this new data was just going to be a big hassle, and prone to error. I needed a technique that would work for a spreadsheet, and enable me to import the data with minimal detail work.
Furthermore, this technique needed to be easily performed multiple times, because I could be getting more changes, sent as a spreadsheet.
I settled on a technique that created a machine-usable copy of the spreadsheet, right on the spreadsheet. The data would be converted to JSON that looked exactly like the origin data, and then further processed to produce the database the application needed.
This workflow kept my hands out of the data, so I was less likely to damage it.
Tables as Hierarchies
All data in Javascript is objects, and these are simple hierarchies that let you store n-dimensional databases.
Tables are 2-dimensional databases. There’s rows and columns. Each cell is a value. To find a value, you need to know the two dimensions.
The data I had was 4-dimensional. To find a value, you needed to know all four dimensions.
The trick was to keep the data in its basic tabular format, retain the columns, but change the row heading into a “path” that would hold the remaining dimensions.
In my case, there were 4-dimensions, total. So the column headings were one dimension. The remaining 3 dimensions were concatenated into a path, and used as the row heading.
This produced a simple table that contained all the data.
All this work was done on the spreadsheet, using the area to the right of the original data.
This could be copy-pasted into the tool, and then edited to remove blank lines.
One click of the button, and it was converted into A CSV file in JSON format.
This kept the data in the same shape as the original tables.
This data, in turn, was read by JS functions and turned into a hierarchy.
Two Step Conversion from a Table to a Hierarchy
I did the conversion in two steps.
First, convert the table into a key-value store. The path is the key, and the value is the value. It’s a 1-dimensional database.
This first conversion simply takes the columns and turns the headings into part of the path. So, while this new database is 1-dimensional, the key is a string that contains all 4 dimensions that define the location of the data.
Second, the key-value store is turned into a hierarchy by converting the paths, which look like file paths, into a hierarchies of objects.