Greenplum to file using PSQL

我们两清 提交于 2019-12-25 07:12:04

问题


I'm trying to export data from Green-plum to a text file(client) with pipe delimiter using PSQL and \copy. In the output i see single slash is converted to double slash and tab is converted \t. Example N\A is converted to N\\A

So how to get just N\A instead N\\A and just spaces instead of \t ?

Note: i`m allowed to use only \copy. Since my file is huge im getting space issue while use SED or Perl for find and replace


回答1:


Assuming you don't have any "^" characters, you could use that as the escape character.

copy tpcds.call_center to stdout with delimiter '|' escape '^';

More on copy can be found here: https://www.postgresql.org/docs/8.2/static/sql-copy.html

This technique will be relatively slow and put a burden on the Master. If you used gpfdist instead, you could leverage the parallelism in the cluster and bypass the master. This solution is ideal for unloading large amounts of data.

First, start the gpfidst process:

[gpadmin@gpdbsne ~]$ gpfdist -p 8888 > gpfdist_8888.log 2>&1 < gpfdist_8888.log &
[1] 2255

Now, you can create the external table.

[gpadmin@gpdbsne ~]$ psql 
SET
Timing is on.
psql (8.2.15)
Type "help" for help.

gpadmin=# create writable external table tpcds.et_call_center 
(like tpcds.call_center) 
location ('gpfdist://gpdbsne:8888/call_center.txt') 
format 'text' (delimiter '|' escape '^');
NOTICE:  Table doesn't have 'distributed by' clause, defaulting to distribution columns from LIKE table
CREATE EXTERNAL TABLE
Time: 18.681 ms

Now, you insert the data:

gpadmin=# insert into tpcds.et_call_center select * from tpcds.call_center;                                                                             
INSERT 0 6
Time: 72.653 ms
gpadmin=# \q

Verify:

[gpadmin@gpdbsne ~]$ wc -l call_center.txt 
6 call_center.txt

In my example, I used the hostname "gpdbsne" which is accessible to all segments in this cluster. Typically, Greenplum uses a private network for communication between segments so this hostname will need to be connected to the private network.

Since the writable external table is written to with SQL, you can use whatever transformation logic you want in the SQL so you can change tabs to spaces if you want. This eliminates the need for awk or sed for post processing the files. Copy can use SQL too but like I said, it is a slower than using writable external tables.



来源:https://stackoverflow.com/questions/38209612/greenplum-to-file-using-psql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!