Insert SQL statements via command line without reopening connection to remote database

后端 未结 2 1372
轮回少年
轮回少年 2021-02-05 18:16

I have a large amount of data files to process and to be stored in the remote database. Each line of a data file represents a row in the database, but must be formatted before i

相关标签:
2条回答
  • 2021-02-05 19:11

    Answer to your actual question

    Yes. You can use a named pipe instead of creating a file. Consider the following demo.

    Create a schema x in my database event for testing:

    -- DROP SCHEMA x CASCADE;
    CREATE SCHEMA x;
    CREATE TABLE x.x (id int, a text);
    

    Create a named pipe (fifo) from the shell like this:

    postgres@db:~$ mkfifo --mode=0666 /tmp/myPipe
    

    Either 1) call the SQL command COPY using a named pipe on the server:

    postgres@db:~$ psql event -p5433 -c "COPY x.x FROM '/tmp/myPipe'"
    

    This will acquire an exclusive lock on the table x.x in the database. The connection stays open until the fifo gets data. Be careful not to leave this open for too long! You can call this after you have filled the pipe to minimize blocking time. You can chose the sequence of events. The command executes as soon as two processes bind to the pipe. The first waits for the second.

    Or 2) you can execute SQL from the pipe on the client:

    postgres@db:~$ psql event -p5433 -f /tmp/myPipe
    

    This is better suited for your case. Also, no table locks until SQL is executed in one piece.

    Bash will appear blocked. It is waiting for input to the pipe. To do it all from one bash instance, you can send the waiting process to the background instead. Like this:

    postgres@db:~$ psql event -p5433 -f /tmp/myPipe 2>&1 &
    

    Either way, from the same bash or a different instance, you can fill the pipe now.
    Demo with three rows for variant 1):

    postgres@db:~$ echo '1  foo' >> /tmp/myPipe; echo '2    bar' >> /tmp/myPipe; echo '3    baz' >> /tmp/myPipe;
    

    (Take care to use tabs as delimiters or instruct COPY to accept a different delimiter using WITH DELIMITER 'delimiter_character')
    That will trigger the pending psql with the COPY command to execute and return:

    COPY 3
    

    Demo for for variant 2):

    postgres@db:~$ (echo -n "INSERT INTO x.x VALUES (1,'foo')" >> /tmp/myPipe; echo -n ",(2,'bar')" >> /tmp/myPipe; echo ",(3,'baz')" >> /tmp/myPipe;)
    
    INSERT 0 3
    

    Delete the named pipe after you are done:

    postgres@db:~$ rm /tmp/myPipe
    

    Check success:

    event=# select * from x.x;
     id |         a
    ----+-------------------
      1 | foo
      2 | bar
      3 | baz
    

    Useful links for the code above

    Reading compressed files with postgres using named pipes
    Introduction to Named Pipes
    Best practice to run bash script in background


    Advice you may or may not not need

    For bulk INSERT you have better solutions than a separate INSERT per row. Use this syntax variant:

    INSERT INTO mytable (col1, col2, col3) VALUES
     (1, 'foo', 'bar')
    ,(2, 'goo', 'gar')
    ,(3, 'hoo', 'har')
    ...
    ;
    

    Write your statements to a file and do one mass INSERT like this:

    psql -h remote_server -U username -d database -p 5432 -f my_insert_file.sql
    

    (5432 or whatever port the db-cluster is listening on)
    my_insert_file.sql can hold multiple SQL statements. In fact, it's common practise to restore / deploy whole databases like that. Consult the manual about the -f parameter, or in bash: man psql.

    Or, if you can transfer the (compressed) file to the server, you can use COPY to insert the (decompressed) data even faster.

    You can also do some or all of the processing inside PostgreSQL. For that you can COPY TO (or INSERT INTO) a temporary table and use plain SQL statements to prepare and finally INSERT / UPDATE your tables. I do that a lot. Be aware that temporary tables live and die with the session.

    You could use a GUI like pgAdmin for comfortable handling. A session in an SQL Editor window remains open until you close the window. (Therefore, temporary tables live until you close the window.)

    0 讨论(0)
  • 2021-02-05 19:16

    I know I'm late to the party, but why couldn't you combine all your INSERT statements into a single string, with a semicolon marking the end of each statement? (Warning! Pseudocode ahead...)

    Instead of:

    for each line
      sql_statement="INSERT whatever YOU want"
      echo $sql_statement | psql ...
    done
    

    Use:

    sql_statements=""
    for each line
      sql_statement="INSERT whatever YOU want;"
      sql_statements="$sql_statements $sql_statement"
    done
    echo $sql_statements | psql ...
    

    That way you don't have to create anything on your filesystem, do a bunch of redirection, run any tasks in the background, remember to delete anything on your filesystem afterwards, or even remind yourself what a named pipe is.

    0 讨论(0)
提交回复
热议问题