amazon-redshift | 易学教程

Best practice for reading data from Kafka to AWS Redshift

阅读更多关于 Best practice for reading data from Kafka to AWS Redshift

问题 What is the best practice for moving data from a Kafka cluster to a Redshift table? We have continuous data arriving on Kafka and I want to write it to tables in Redshift (it doesn't have to be in real time). Should I use Lambda function? Should I write a Redshift connector (consumer) that will run on a dedicated EC2 instance? (downside is that I need to handle redundancy) Is there some AWS pipeline service for that? 回答1: Kafka Connect is commonly used for streaming data from Kafka to (and

Redshift: Possibility to specify suffix for paths when doing PARTITIONED UNLOAD to S3?

阅读更多关于 Redshift: Possibility to specify suffix for paths when doing PARTITIONED UNLOAD to S3?

问题 Is there any way to provide a suffix for paths when doing a partitioned unload to S3? e.g. if I want to use the output of +several+ queries for batch jobs, where query outputs are partitioned by date. Currently I have a structure in S3 like: s3://bucket/path/queryA/key=1/ *.parquet s3://bucket/path/queryA/key=2/ *.parquet s3://bucket/path/queryB/key=1/ *.parquet s3://bucket/path/queryB/key=2/ *.parquet But ideally, I would like to have: s3://bucket/path/key=1/queryA/ *.parquet s3://bucket

Redshift: Possibility to specify suffix for paths when doing PARTITIONED UNLOAD to S3?

阅读更多关于 Redshift: Possibility to specify suffix for paths when doing PARTITIONED UNLOAD to S3?

How to copy csv data file to Amazon RedShift?

阅读更多关于 How to copy csv data file to Amazon RedShift?

问题 I'm trying to migrating some MySQL tables to Amazon Redshift, but met some problems. The steps are simple: 1. Dump the MySQL table to a csv file 2. Upload the csv file to S3 3. Copy the data file to RedShift Error occurs in step 3: The SQL command is: copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' csv; The error info: An error occurred when executing the SQL command: copy TABLE_A from 's3://ciphor/TABLE_A.csv'

Create a Query to check if any Column in a table is Null

阅读更多关于 Create a Query to check if any Column in a table is Null

问题 I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null. Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table? select count(id) from schema.table where id is not null If there are 100 records I would expect all columns to come back with 100 but if one column is null

Create a Query to check if any Column in a table is Null

阅读更多关于 Create a Query to check if any Column in a table is Null

Create a Query to check if any Column in a table is Null

阅读更多关于 Create a Query to check if any Column in a table is Null

Partitioning Data in SQL On-Demand with Blob Storage as Data Source

阅读更多关于 Partitioning Data in SQL On-Demand with Blob Storage as Data Source

问题 In Amazon Redshift there is a way to create a partition key when using your S3 bucket as a data source. Link. I am attempting to do something similar in Azure Synapse using the SQL On-Demand service. Currently I have a storage account that is partitioned such that it follows this scheme: -Sales (folder) - 2020-10-01 (folder) - File 1 - File 2 - 2020-10-02 (folder) - File 3 - File 4 To create a view and pull in all 4 files I ran the command: CREATE VIEW testview3 AS SELECT * FROM OPENROWSET (

Clear cache on AWS Redshift

阅读更多关于 Clear cache on AWS Redshift

问题 I am doing testing against AWS Redshift, and to replicate real world scenarios I need my test queries to not be cached so as not to give a false picture of performance. Is there any way for me to clear the Redshift cache between query runs? 回答1: I believe you can disable the cache for the testing sessions by setting the value enable_result_cache_for_session to off From the documentation If enable_result_cache_for_session is off, Amazon Redshift ignores the results cache and executes all

Clear cache on AWS Redshift

阅读更多关于 Clear cache on AWS Redshift