Automatically Load SQL table by reading data from text file

后端 未结 2 835
北海茫月
北海茫月 2021-02-06 19:22

I am trying to write a python script that is going to load the tables that I created in pyhton using SQL and populate them with data automatically that is coming from a text fil

相关标签:
2条回答
  • 2021-02-06 20:12

    If you can use standard sqlite3 utility, you can do it much easier:

    sqlite3 -init mydata.sql mydatabase.db ""
    

    simply call this line from your python script, and you're done.

    This will read any text file that contains valid SQL statements, and will create mydatabase.db if it did not exist. What's more important, it supports statements spanning more than one line, and also properly ignores SQL comments using both --comment syntax and C/C++ like /*comment*/ syntax.

    Typically your mydata.sql content should look like this:

    BEGIN TRANSACTION;
    CREATE TABLE IF NOT EXISTS table1 (
        id INTEGER PRIMARY KEY AUTO_INCREMENT,
        name VARCHAR(32)
    );
    INSERT INTO table1 (name) VALUES
    ('John'),
    ('Jack'),
    ('Jill');
    -- more statements ...
    COMMIT;
    
    0 讨论(0)
  • 2021-02-06 20:16

    You haven't explained what format the data are in, or what your table structure is, or how you want to map them, which makes this difficult to answer. But I'll make up my own, and answer that, and hopefully it will help:

    infile.txt:

    CommonName,Species,Location,Color
    Black-headed spider monkey,Ateles fusciceps,Ecuador,black
    Central American squirrel monkey,Saimiri oerstedii,Costa Rica,orange
    Vervet,Chlorocebus pygerythrus,South Africa,white
    

    script.py

    import csv
    import sqlite3
    
    db = sqlite3.connect('outfile.db')
    cursor = db.cursor()
    cursor.execute('CREATE TABLE Monkeys (Common Name, Color, Species)')
    cursor.execute('''CREATE TABLE MonkeyLocations (Species, Location,
                      FOREIGN KEY(Species) REFERENCES Monkeys(Species))''')
    with open('infile.txt') as f:
        for row in csv.DictReader(f):
            cursor.execute('''INSERT INTO Monkeys 
                              VALUES (:CommonName, :Color, :Species)''', row)
            cursor.execute('''INSERT INTO MonkeyLocations 
                              VALUES (:Species, :Location)''', row)
    db.commit()
    db.close()
    

    Of course if your real data are in some other format than CSV, you'll use different code to parse the input file.

    I've also made things slightly more complex than your real data might have to deal with—the CSV columns don't have quite the same names as the SQL columns.

    In other ways, your data might be more complex—e.g., if your schema has foreign keys that reference an auto-incremented row ID instead of a text field, you'll need to get the rowid after the first insert.

    But this should be enough to give you the idea.


    Now that you've shown more details… you were on the right track (although it's wasteful to call readlines instead of just iterating over fd directly, and you should close your db and file, ideally with a with statement, …), but you've got a simple mistake right near the end that prevents you from getting any farther:

    insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
    c.execute(insert)
    

    You've put the formatting % expression directly into the string, instead of using the operator on the string. I think what you were trying to do is:

    insert = """insert into LN values (%s, %s, %s);""" % (currentRow[4], currentRow[5], currentRow[6])
    c.execute(insert)
    

    However, you shouldn't do that. Instead, do this:

    insert = """insert into LN values (?, ?, ?);"""
    c.execute(insert, (currentRow[4], currentRow[5], currentRow[6]))
    

    What's the difference?

    Well, the first one just inserts the values into the statement as Python strings. That means you have to take care of converting to the proper format, quoting, escaping, etc. yourself, instead of letting the database engine decide how to deal with each value. Besides being a source of frustrating bugs when you try to save a boolean value or forget to quote a string, this also leaves you open to SQL injection attacks unless you're very careful.

    There are other problems besides that one. For example, most databases will try to cache repeated statements, and it's trivial to tell that 3000 instances of insert into LN values (?, ?, ?) are all the same statement, but less so to tell that insert into LN values (5, 1.0, 200) and insert into LN values (1, 5.0, 5000) are the same statement.

    0 讨论(0)
提交回复
热议问题