amazon-redshift-spectrum

AWS Redshift - Failed to incorporate external table into local catalog

阅读更多关于 AWS Redshift - Failed to incorporate external table into local catalog

问题 Having a problem with one of our external tables in redshift. We have over 300 tables in AWS Glue which have been added to our redshift cluster as an external schema called events . Most of the tables in events can be queries fine. But when querying one of the tables called item_loaded we get the following error; select * from events.item_loaded limit 1; ERROR: XX000: Failed to incorporate external table "events"."item_loaded" into local catalog. LOCATION: localize_external_table, /home/ec2

Partitioning Data in SQL On-Demand with Blob Storage as Data Source

阅读更多关于 Partitioning Data in SQL On-Demand with Blob Storage as Data Source

问题 In Amazon Redshift there is a way to create a partition key when using your S3 bucket as a data source. Link. I am attempting to do something similar in Azure Synapse using the SQL On-Demand service. Currently I have a storage account that is partitioned such that it follows this scheme: -Sales (folder) - 2020-10-01 (folder) - File 1 - File 2 - 2020-10-02 (folder) - File 3 - File 4 To create a view and pull in all 4 files I ran the command: CREATE VIEW testview3 AS SELECT * FROM OPENROWSET (

Quote escaped quotes in Redshift external tables

阅读更多关于 Quote escaped quotes in Redshift external tables

问题 I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180: If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc" I get no errors but the final table has a null value where my string should be. Is there a way to tell Redshift to understand this csv format when creating an external table? I do not want

S3 Query Exception (Fetch)

阅读更多关于 S3 Query Exception (Fetch)

问题 I have uploaded data from Redshift to S3 in Parquet format and created the data catalog in Glue. I have been able to query the table from Athena but when I create the external schema on Redshift and tried to query on the table I'm getting the below error ERROR: S3 Query Exception (Fetch) DETAIL: ----------------------------------------------- error: S3 Query Exception (Fetch) code: 15001 context: Task failed due to an internal error. File 'https://s3-eu-west-1.amazonaws.com/bucket/folder

Using Redshift Spectrum to read the data in external table in AWS Redshift

阅读更多关于 Using Redshift Spectrum to read the data in external table in AWS Redshift

问题 I did the below in AWS Redshift cluster to read the Parquet file from S3. create external schema s3_external_schema from data catalog database 'dev' iam_role 'arn:aws:iam::<MyuniqueId>:role/<MyUniqueRole>' create external database if not exists; then CREATE external table s3_external_schema.SUPPLIER_PARQ_1 ( S_SuppKey BIGINT , S_Name varchar(64) , S_Address varchar(64) , S_NationKey int , S_Phone varchar(18) , S_AcctBal decimal(13, 2) , S_Comment varchar(105)) partitioned by (Supplier bigint,

How to generate 12 digit unique number in redshift?

阅读更多关于 How to generate 12 digit unique number in redshift?

问题 I have 3 columns in a table i.e. email_id , rid , final_id . Rules for rid and final_id : If the email_id has a corresponding rid , use rid as the final_id . If the email_id does not have a corresponding rid (i.e. rid is null), generate a unique 12 digit number and insert into final_id field. How to generate 12 digit unique number in redshift? 回答1: From Creating a UUID function in Redshift: By default there is no UUID function in AWS Redshift. However with the Python User-Defined Function you

AWS Glue: How to handle nested JSON with varying schemas

阅读更多关于 AWS Glue: How to handle nested JSON with varying schemas

问题 Objective: We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum. Background: The JSON data is from DynamoDB Streams and is deeply nested. The first level of JSON has a consistent set of elements: Keys, NewImage, OldImage, SequenceNumber, ApproximateCreationDateTime, SizeBytes, and EventName. The only variation is that some records do not have a NewImage and some don't have an

Redshift Spectrum: Automatically partition tables by date/folder

阅读更多关于 Redshift Spectrum: Automatically partition tables by date/folder

We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure: <report-name> |--reportDate-<date-stamp> |-- part0.csv.gz |-- part1.csv.gz We want to be able to run reports partitioned by daily export. According to this page, you can partition data in Redshift Spectrum by a key which is based on the source S3 folder where your Spectrum table sources its data. However, from the example, it looks like you need an ALTER statement for each partition: alter table spectrum.sales_part add partition(saledate='2008-01-01') location 's3://bucket/tickit/spectrum

Redshift Spectrum: Automatically partition tables by date/folder

阅读更多关于 Redshift Spectrum: Automatically partition tables by date/folder

问题 We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure: <report-name> |--reportDate-<date-stamp> |-- part0.csv.gz |-- part1.csv.gz We want to be able to run reports partitioned by daily export. According to this page, you can partition data in Redshift Spectrum by a key which is based on the source S3 folder where your Spectrum table sources its data. However, from the example, it looks like you need an ALTER statement for each partition: alter

AWS Glue: How to handle nested JSON with varying schemas

阅读更多关于 AWS Glue: How to handle nested JSON with varying schemas

Objective: We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum. Background: The JSON data is from DynamoDB Streams and is deeply nested. The first level of JSON has a consistent set of elements: Keys, NewImage, OldImage, SequenceNumber, ApproximateCreationDateTime, SizeBytes, and EventName. The only variation is that some records do not have a NewImage and some don't have an OldImage. Below this first level, though, the schema varies widely. Ideally, we would like to use Glue to