问题
I have set up a redshift cluster in a private subnet. I can successfully connect to my redshift cluster and do basic SQL queries through DBeaver.
I need to upload some file from s3 to redshift as well, so I set up a s3 gateway in my private subnet and updated the route table for my private subnet to add the required route as follow:
Destination Target Status Propagated
192.168.0.0/16 local active No
pl-7ba54012 (com.amazonaws.us-east-2.s3, 52.219.80.0/20, 3.5.128.0/21, 52.219.96.0/20, 52.92.76.0/22) vpce-04eed78f4db84ae49
active No
0.0.0.0/0 nat-0a73ba7659e887232 active No
However, I can not run the copy query from my s3 bucket
copy venue
from 's3://*****/tickit/venue_pipe.txt'
iam_role 'arn:aws:iam::******:role/global-dev-rdt-role-S3ReadonlyAccess'
region 'us-east-2';
There is no restricitve policies on my bucket and public and private subnet security groups and I already can run SQL queries on my redshift cluster in a private subnet.
update: the security group for redshift cluster allow all conection to port 5439
Type Protocol Port Range Source Description
Redshift TCP 5439 0.0.0.0/0
Redshift TCP 5439 ::/0
SSH TCP 22 sg-0f933e18d6c1967b8
回答1:
To reproduce your situation, I did the following:
- Created a new VPC with a Public Subnet and a Private Subnet (no NAT Gateway)
- Launched a 1-node Amazon Redshift cluster in the private subnet
- Enhanced VPC Routing = No
- Publicly accessible = No
- Launched an Amazon EC2 Linux instance in the public subnet
- Ran
sudo yum install postgresql
on the EC2 instance - Established a connection to the Redshift cluster via
psql
on the EC2 instance (psql -h xx.yy.ap-southeast-2.redshift.amazonaws.com -p 5439 -U username
) - Created a table (
create table foo(id integer);
) - Loaded the table (
copy foo from 's3://my-bucket/bar.txt' iam_role 'xxx';
)
This worked successfully, with a message of:
INFO: Load into table 'foo' completed, 4 record(s) loaded successfully.
Therefore, a VPC Endpoint/NAT Gateway is not required to perform a COPY
command from Redshift. The Redshift cluster has its own special way to connect to S3, seemingly via a Redshift 'backend'.
If the data is being loaded from Amazon S3 in the same Region, then the traffic would stay wholly within the AWS network. If the data was coming from a different region, it would still be encrypted because communication with Amazon S3 would be via HTTPS.
Second test: Using Enhanced VPC Networking
To mirror your situation, I launched a different Redshift cluster with Enhanced VPC routing enabled.
When I ran the COPY
command, it predictably hung because I did not configure a means for the Redshift cluster to access Amazon S3 via the VPC.
I then created a VPC Endpoint for Amazon S3 and connected it to the private subnet with a "Full Access" policy.
Then, when I re-ran the COPY
command, it successfully loaded data from Amazon S3.
Bottom line: It worked for me. You might want to compare your configuration with the above steps that I took.
来源:https://stackoverflow.com/questions/59552702/can-not-copy-data-from-s3-to-redshift-cluster-in-a-private-subnet