Can not copy data from s3 to redshift cluster in a private subnet

问题

I have set up a redshift cluster in a private subnet. I can successfully connect to my redshift cluster and do basic SQL queries through DBeaver.

I need to upload some file from s3 to redshift as well, so I set up a s3 gateway in my private subnet and updated the route table for my private subnet to add the required route as follow:

Destination       Target                 Status            Propagated
192.168.0.0/16    local                  active            No   
pl-7ba54012 (com.amazonaws.us-east-2.s3, 52.219.80.0/20, 3.5.128.0/21, 52.219.96.0/20, 52.92.76.0/22)   vpce-04eed78f4db84ae49  
active             No   
0.0.0.0/0         nat-0a73ba7659e887232  active            No

However, I can not run the copy query from my s3 bucket

copy venue
from 's3://*****/tickit/venue_pipe.txt'
iam_role 'arn:aws:iam::******:role/global-dev-rdt-role-S3ReadonlyAccess'
region 'us-east-2';

There is no restricitve policies on my bucket and public and private subnet security groups and I already can run SQL queries on my redshift cluster in a private subnet.

update: the security group for redshift cluster allow all conection to port 5439

Type         Protocol      Port Range     Source      Description
Redshift     TCP           5439           0.0.0.0/0
Redshift     TCP           5439           ::/0
SSH          TCP           22             sg-0f933e18d6c1967b8

回答1:

To reproduce your situation, I did the following:

Created a new VPC with a Public Subnet and a Private Subnet (no NAT Gateway)
Launched a 1-node Amazon Redshift cluster in the private subnet
- Enhanced VPC Routing = No
- Publicly accessible = No
Launched an Amazon EC2 Linux instance in the public subnet
Ran sudo yum install postgresql on the EC2 instance
Established a connection to the Redshift cluster via psql on the EC2 instance (psql -h xx.yy.ap-southeast-2.redshift.amazonaws.com -p 5439 -U username)
Created a table (create table foo(id integer);)
Loaded the table (copy foo from 's3://my-bucket/bar.txt' iam_role 'xxx';)

This worked successfully, with a message of:

INFO:  Load into table 'foo' completed, 4 record(s) loaded successfully.

Therefore, a VPC Endpoint/NAT Gateway is not required to perform a COPY command from Redshift. The Redshift cluster has its own special way to connect to S3, seemingly via a Redshift 'backend'.

If the data is being loaded from Amazon S3 in the same Region, then the traffic would stay wholly within the AWS network. If the data was coming from a different region, it would still be encrypted because communication with Amazon S3 would be via HTTPS.

Second test: Using Enhanced VPC Networking

To mirror your situation, I launched a different Redshift cluster with Enhanced VPC routing enabled.

When I ran the COPY command, it predictably hung because I did not configure a means for the Redshift cluster to access Amazon S3 via the VPC.

I then created a VPC Endpoint for Amazon S3 and connected it to the private subnet with a "Full Access" policy.

Then, when I re-ran the COPY command, it successfully loaded data from Amazon S3.

Bottom line: It worked for me. You might want to compare your configuration with the above steps that I took.

来源：https://stackoverflow.com/questions/59552702/can-not-copy-data-from-s3-to-redshift-cluster-in-a-private-subnet

标签

amazon-web-services

amazon-s3

amazon-redshift

amazon-vpc