I have a CSV file with 10 columns. After creating a PostgreSQL table with 4 columns, I want to copy some of 10 columns into the table.
the columns of my CSV table ar
You can provide the columns your want to fill with the COPY
command. Like so:
\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;
Here's the doc for the COPY
command.
You could take James Brown's suggestion further and do, all in one line:
$ awk -F ',' '{print $2","$5","$7","$10}' file | psql -d db -c "\copy MyTable from STDIN csv header"
To load data from spreadsheet (Excel or OpenOffice Calc) into postgreSQL:
Save the spreadsheet page as a CSV file. Prefered method is to open the spreadsheet on OpenOffice Calc and do the saving. On “Export to text file” window choose Character Set as Unicode (UTF8), Field Delimiter: “,” and Text Delimiter “ “ “. Message will be displayed saying only active sheet is saved. Note: This file has to be saved on a folder but not on desktop and have to save in UTF8 format (postgreSQL by dafault is step up for UTF8 encoding). If saved on desktop, postgreSQL will give “access denied” message and won't upload.
In PostgreSQL, create an empty table with same number of column as the spreadsheet.
Note: On each column, column-name has to be same, data type has to be same. Also, keep in mind the length of data where character varying with enough field.
Then on postgreSQL, on SQL window, put the code:
copy "ABC"."def" from E'C:\\tmp\\blabla.csv' delimiters ',' CSV HEADER;
NOTE: Here C:\\tmp is the folder where CSV-file “blabla” is saved. “ABC”.”def” is the table created on postgreSQL where "ABC" is schema and"def" is the actual table. Then do “execute query” by pressing the green button on top. “CSV HEADER” is needed when CSV table has heading at the start of every column.
If everythig is ok, no error message will be displayed and table data from CSV file will be loaded into the postgreSQL table. But if there is an error message do as following:
If error message is saying that the data is too long for a specific column, then increase the column size. This happens mostly on character and character varying column. Then run the “execute query” command again.
If error message is saying that the data type doesn't match to a particular column, then change the data type on postgreSQL table-column to match the one in CSV table.
In your case, after creating CSV file, delete the unwanted columns and match the columns in postgre table.
Just arrived here on a pursuit for a solution to only load a subset of columns but apparently it's not possible. So, use awk (or cut
) to extract the wanted columns to a new file new_file
:
$ awk '{print $2, $5, $7, $10}' file > new_file
and load the new_file
. You could pipe the output straight to psql
:
$ cut -d \ -f 2,5,7,10 file |
psql -h host -U user -c "COPY table(col1,col2,col3,col4) FROM STDIN DELIMITER ' '" database
Notice COPY
, not \COPY
.
Update:
As it was pointed out in the comments, neither of the above examples can handle quoted delimiters in the data. The same goes for newlines, too, as awk or cut
are not CSV aware. Quoted delimiters can be handled with GNU awk, though.
This is a three-column file:
$ cat file
1,"2,3",4
Using GNU awk's FPAT variable we can change the order of the fields (or get a subset of them) even when the quoted fields have field separators in them:
$ gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")";OFS=","}{print $2,$1,$3}' file
"2,3",1,4
Explained:
$ gawk '
BEGIN { # instead of field separator FS
FPAT="([^,]*)|(\"[^\"]+\")" # ... we define field pattern FPAT
OFS="," # output field separator OFS
}
{
print $2,$1,$3 # change field order
# print $2 # or get a subset of fields
}' file
Notice that FPAT
is GNU awk only. For other awks it's just a regular variable.
Create a temporary table with all the columns in the input file
create temporary table t (x1 integer, ... , x10 text)
Copy from the file into it:
copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)
Now insert into the definitive table from the temp:
insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t
And drop it:
drop table t
Use the file_fdw extension. As superuser:
create extension file_fdw;
create server my_csv foreign data wrapper file_fdw;
create foreign table my_csv (
x1 integer,
x2 text,
x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;
Grant select permission on the table to the user who will read it:
grant select on table my_csv to the_read_user;
Then whenever necessary read directly from the csv file as if it were a table:
insert into my_table (x2)
select x2
from my_csv
where x1 = 2
As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.
Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:
PROGRAM
A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.
Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.
This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut
(in a UNIX-based system) to select certain columns by order:
COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)
However, cut
has several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.
There are several other open source command-line tools that are better at manipulating CSV files, such as csvkit or miller. Here's an example using miller
to select columns by name:
COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)