How can I write a stored procedure that imports data from a CSV file and populates the table?
Most other solutions here require that you create the table in advance/manually. This may not be practical in some cases (e.g., if you have a lot of columns in the destination table). So, the approach below may come handy.
Providing the path and column count of your csv file, you can use the following function to load your table to a temp table that will be named as target_table
:
The top row is assumed to have the column names.
create or replace function data.load_csv_file
(
target_table text,
csv_path text,
col_count integer
)
returns void as $$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);
iter := 1;
col_first := (select col_1 from temp_table limit 1);
-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;
end;
$$ language plpgsql;
IMHO, the most convenient way is to follow "Import CSV data into postgresql, the comfortable way ;-)", using csvsql from csvkit, which is a python package installable via pip.
You can create a bash file as import.sh (that your CSV format is a tab delimiter)
#!/usr/bin/env bash
USER="test"
DB="postgres"
TBALE_NAME="user"
CSV_DIR="$(pwd)/csv"
FILE_NAME="user.txt"
echo $(psql -d $DB -U $USER -c "\copy $TBALE_NAME from '$CSV_DIR/$FILE_NAME' DELIMITER E'\t' csv" 2>&1 |tee /dev/tty)
And then run this script.
If you don't have permission to use COPY
(which work on the db server), you can use \copy
instead (which works in the db client). Using the same example as Bozhidar Batsov:
Create your table:
CREATE TABLE zip_codes
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision,
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);
Copy data from your CSV file to the table:
\copy zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
You can also specify the columns to read:
\copy zip_codes(ZIP,CITY,STATE) FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
See the documentation for COPY:
Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.
and note:
For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.
You could also use pgAdmin, which offers a GUI to do the import. That's shown in this SO thread. The advantage of using pgAdmin is that it also works for remote databases.
Much like the previous solutions though, you would need to have your table on the database already. Each person has his own solution but what I usually do is open the CSV in Excel, copy the headers, paste special with transposition on a different worksheet, place the corresponding data type on the next column then just copy and paste that to a text editor together with the appropriate SQL table creation query like so:
CREATE TABLE my_table (
/*paste data from Excel here for example ... */
col_1 bigint,
col_2 bigint,
/* ... */
col_n bigint
)
You can also use pgfutter, or, even better, pgcsv.
pgfutter is quite buggy, I'd recomment pgcsv.
Here's how to do it with pgcsv:
sudo pip install pgcsv
pgcsv --db 'postgresql://localhost/postgres?user=postgres&password=...' my_table my_file.csv