How to import CSV file data into a PostgreSQL table?

前端 未结 19 2295
再見小時候
再見小時候 2020-11-22 02:14

How can I write a stored procedure that imports data from a CSV file and populates the table?

相关标签:
19条回答
  • 2020-11-22 02:55

    Most other solutions here require that you create the table in advance/manually. This may not be practical in some cases (e.g., if you have a lot of columns in the destination table). So, the approach below may come handy.

    Providing the path and column count of your csv file, you can use the following function to load your table to a temp table that will be named as target_table:

    The top row is assumed to have the column names.

    create or replace function data.load_csv_file
    (
        target_table text,
        csv_path text,
        col_count integer
    )
    
    returns void as $$
    
    declare
    
    iter integer; -- dummy integer to iterate columns with
    col text; -- variable to keep the column name at each iteration
    col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
    
    begin
        create table temp_table ();
    
        -- add just enough number of columns
        for iter in 1..col_count
        loop
            execute format('alter table temp_table add column col_%s text;', iter);
        end loop;
    
        -- copy the data from csv file
        execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);
    
        iter := 1;
        col_first := (select col_1 from temp_table limit 1);
    
        -- update the column names based on the first row which has the column names
        for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
        loop
            execute format('alter table temp_table rename column col_%s to %s', iter, col);
            iter := iter + 1;
        end loop;
    
        -- delete the columns row
        execute format('delete from temp_table where %s = %L', col_first, col_first);
    
        -- change the temp table name to the name given as parameter, if not blank
        if length(target_table) > 0 then
            execute format('alter table temp_table rename to %I', target_table);
        end if;
    
    end;
    
    $$ language plpgsql;
    
    0 讨论(0)
  • 2020-11-22 03:00

    IMHO, the most convenient way is to follow "Import CSV data into postgresql, the comfortable way ;-)", using csvsql from csvkit, which is a python package installable via pip.

    0 讨论(0)
  • 2020-11-22 03:00

    You can create a bash file as import.sh (that your CSV format is a tab delimiter)

    #!/usr/bin/env bash
    
    USER="test"
    DB="postgres"
    TBALE_NAME="user"
    CSV_DIR="$(pwd)/csv"
    FILE_NAME="user.txt"
    
    echo $(psql -d $DB -U $USER  -c "\copy $TBALE_NAME from '$CSV_DIR/$FILE_NAME' DELIMITER E'\t' csv" 2>&1 |tee /dev/tty)
    
    

    And then run this script.

    0 讨论(0)
  • 2020-11-22 03:01

    If you don't have permission to use COPY (which work on the db server), you can use \copy instead (which works in the db client). Using the same example as Bozhidar Batsov:

    Create your table:

    CREATE TABLE zip_codes 
    (ZIP char(5), LATITUDE double precision, LONGITUDE double precision, 
    CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);
    

    Copy data from your CSV file to the table:

    \copy zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
    

    You can also specify the columns to read:

    \copy zip_codes(ZIP,CITY,STATE) FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
    

    See the documentation for COPY:

    Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

    and note:

    For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

    0 讨论(0)
  • 2020-11-22 03:01

    You could also use pgAdmin, which offers a GUI to do the import. That's shown in this SO thread. The advantage of using pgAdmin is that it also works for remote databases.

    Much like the previous solutions though, you would need to have your table on the database already. Each person has his own solution but what I usually do is open the CSV in Excel, copy the headers, paste special with transposition on a different worksheet, place the corresponding data type on the next column then just copy and paste that to a text editor together with the appropriate SQL table creation query like so:

    CREATE TABLE my_table (
        /*paste data from Excel here for example ... */
        col_1 bigint,
        col_2 bigint,
        /* ... */
        col_n bigint 
    )
    
    0 讨论(0)
  • 2020-11-22 03:01

    You can also use pgfutter, or, even better, pgcsv.

    pgfutter is quite buggy, I'd recomment pgcsv.

    Here's how to do it with pgcsv:

    sudo pip install pgcsv
    pgcsv --db 'postgresql://localhost/postgres?user=postgres&password=...' my_table my_file.csv
    
    0 讨论(0)
提交回复
热议问题