How to convert a JSON file to an SQLite database

前端 未结 3 1368
滥情空心
滥情空心 2020-12-29 14:21

If I have some sample data, how do I put it into SQLite (preferably fully automated)?

{\"uri\":\"/\",\"user_agent\":\"example1\"}
{\"uri\":\"/foobar\",\"user         


        
相关标签:
3条回答
  • 2020-12-29 15:03

    sqlitebiter appears to provide a python solution:

    A CLI tool to convert CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/TSV/Google-Sheets to a SQLite database file. http://sqlitebiter.rtfd.io/

    docs: http://sqlitebiter.readthedocs.io/en/latest/

    project: https://github.com/thombashi/sqlitebiter

    • last update approximately 3 months ago
    • last issue closed approximately 1 month ago, none open
    • noted today, 2018-03-14
    0 讨论(0)
  • 2020-12-29 15:09

    A way do this without an intermediate CSV or 3rd party tool is to use the JSON1 extension of SQLite combined with the readfile extension that is provided in the sqlite3 CLI tool.

    If the input file is a well-formed JSON file, e.g. the example given as an array:

    [
    {"uri":"/","user_agent":"example1"},
    {"uri":"/foobar","user_agent":"example1"},
    {"uri":"/","user_agent":"example2"},
    {"uri":"/foobar","user_agent":"example3"}
    ]
    

    Then this can be read into the corresponding my_table table as follows. Open the SQLite database file my_db.db using the sqlite3 CLI:

    sqlite3 my_db.db
    

    then create my_table using:

    CREATE TABLE my_table(uri TEXT, user_agent TEXT);
    

    Finally, the JSON data in my_data.json can be inserted into the table with the CLI command:

    INSERT INTO my_table SELECT 
      json_extract(value, '$.uri'), 
      json_extract(value, '$.user_agent')
    FROM json_each(readfile('my_data.json'));
    

    As well as overall being a "more direct" solution, this has the advantage of handling JSON NULL values more consistently than CSV, which will otherwise import them as empty strings.

    If the initial JSON file is newline separated JSON elements, then this can be converted first using jq using:

    jq -s <my_data_raw.json >my_data.json
    

    It's likely there is a way to do this directly in SQLite using JSON1, but I didn't pursue that given that I was already using jq to massage the data prior to import to SQLite.

    0 讨论(0)
  • 2020-12-29 15:14

    I found the easiest way to do this is by using jq and CSV as an intermediary format.

    Edit: As pointed out (thanks @Leo), the original question did show newline delimited JSON objects, which each on their own conform to rfc4627, but not all together in that format. jq can handle a single JSON array of objects much the same way though by preprocessing the file using jq '.[]' <input.json >preprocessed.json. If you happen to be dealing with JSON text sequences (rfc7464) luckily jq has got your back too with the --seq parameter.

    Edit 2: Both the newline separated JSON and the JSON text sequences have one important advantage; they reduce memory requirements down to O(1), meaning your total memory requirement is only dependent on your longest line of input, whereas putting the entire input in a single array requires that either your parser can handle late errors (i.e. after the first 100k elements there's a syntax error), which generally isn't the case to my knowledge, or it will have to parse the entire file twice (first validating syntax, then parsing, in the process discarding previous elements, as is the case with jq --stream) which also happens rarely to my knowledge, or it will try to parse the whole input at once and return the result in one step (think of receiving a Python dict which contains the entirety of your say 50G input data plus overhead) which is usually memory backed, hence raising your memory footprint by just about your total data size.

    Edit 3: If you hit any obstacles, try using keys_unsorted instead of keys. I haven't tested that myself (I kind of assume my columns were already sorted), however @Kyle Barron reports that this was needed.

    Getting the CSV

    First write your data to a file. I will assume data.json here.

    Then construct the header using jq:

    % head -1 data.json | jq -r 'keys | @csv'
    "uri","user_agent"
    

    The head -1 is because we only want one line. jq's -r makes the output a plain string instead of a JSON-String wrapping the CSV. We then call the internal function keys to get the keys of the input as an array. This we send to the @csv formatter which outputs us a single string with the headers in quoted CSV format.

    We then need to construct the data.

    % jq -r '[.[]] | @csv' < data.json
    "/","example1"
    "/foobar","example1"
    "/","example2"
    "/foobar","example3"
    

    We now take the whole input and deconstruct the associative array (map) using .[] and then put it back into a simple array […]. This basically converts our dictionary to an array of keys. Sent to the @csv formatter, we again get some CSV.

    Putting it all together we get a single one-liner in the form of:

    % (head -1 data.json | jq -r 'keys | @csv' && jq -r '[.[]] | @csv' < data.json) > data.csv
    

    If you need to convert the data on the fly, i.e. without a file, try this:

    % cat data.json | (read -r first && jq -r '(keys | @csv),( [.[]] | @csv)' <<<"${first}" && jq -r '[.[]] | @csv')
    

    Loading it into SQLite

    Open an SQLite database:

    sqlite3 somedb.sqlite
    

    Now in the interactive shell do the following (assuming you wrote the CSV to data.csv and want it in a table called my_table):

    .mode csv
    .import data.csv my_table
    

    Now close the shell and open it again for a clean environment. You can now easily SELECT from the database and do whatever you want to.

    Putting it all together

    Have an asciinema recording right there:

    0 讨论(0)
提交回复
热议问题