amazon-redshift

Best practice for reading data from Kafka to AWS Redshift

时光总嘲笑我的痴心妄想 提交于 2021-01-27 05:32:11
问题 What is the best practice for moving data from a Kafka cluster to a Redshift table? We have continuous data arriving on Kafka and I want to write it to tables in Redshift (it doesn't have to be in real time). Should I use Lambda function? Should I write a Redshift connector (consumer) that will run on a dedicated EC2 instance? (downside is that I need to handle redundancy) Is there some AWS pipeline service for that? 回答1: Kafka Connect is commonly used for streaming data from Kafka to (and

Redshift: Possibility to specify suffix for paths when doing PARTITIONED UNLOAD to S3?

烂漫一生 提交于 2021-01-05 04:52:07
问题 Is there any way to provide a suffix for paths when doing a partitioned unload to S3? e.g. if I want to use the output of +several+ queries for batch jobs, where query outputs are partitioned by date. Currently I have a structure in S3 like: s3://bucket/path/queryA/key=1/ *.parquet s3://bucket/path/queryA/key=2/ *.parquet s3://bucket/path/queryB/key=1/ *.parquet s3://bucket/path/queryB/key=2/ *.parquet But ideally, I would like to have: s3://bucket/path/key=1/queryA/ *.parquet s3://bucket

Redshift: Possibility to specify suffix for paths when doing PARTITIONED UNLOAD to S3?

天涯浪子 提交于 2021-01-05 04:45:06
问题 Is there any way to provide a suffix for paths when doing a partitioned unload to S3? e.g. if I want to use the output of +several+ queries for batch jobs, where query outputs are partitioned by date. Currently I have a structure in S3 like: s3://bucket/path/queryA/key=1/ *.parquet s3://bucket/path/queryA/key=2/ *.parquet s3://bucket/path/queryB/key=1/ *.parquet s3://bucket/path/queryB/key=2/ *.parquet But ideally, I would like to have: s3://bucket/path/key=1/queryA/ *.parquet s3://bucket

How to copy csv data file to Amazon RedShift?

亡梦爱人 提交于 2020-12-29 05:19:45
问题 I'm trying to migrating some MySQL tables to Amazon Redshift, but met some problems. The steps are simple: 1. Dump the MySQL table to a csv file 2. Upload the csv file to S3 3. Copy the data file to RedShift Error occurs in step 3: The SQL command is: copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' csv; The error info: An error occurred when executing the SQL command: copy TABLE_A from 's3://ciphor/TABLE_A.csv'

Create a Query to check if any Column in a table is Null

半城伤御伤魂 提交于 2020-12-27 06:25:43
问题 I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null. Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table? select count(id) from schema.table where id is not null If there are 100 records I would expect all columns to come back with 100 but if one column is null

Create a Query to check if any Column in a table is Null

守給你的承諾、 提交于 2020-12-27 06:23:59
问题 I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null. Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table? select count(id) from schema.table where id is not null If there are 100 records I would expect all columns to come back with 100 but if one column is null

Create a Query to check if any Column in a table is Null

那年仲夏 提交于 2020-12-27 06:23:04
问题 I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null. Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table? select count(id) from schema.table where id is not null If there are 100 records I would expect all columns to come back with 100 but if one column is null

Partitioning Data in SQL On-Demand with Blob Storage as Data Source

独自空忆成欢 提交于 2020-12-15 07:16:07
问题 In Amazon Redshift there is a way to create a partition key when using your S3 bucket as a data source. Link. I am attempting to do something similar in Azure Synapse using the SQL On-Demand service. Currently I have a storage account that is partitioned such that it follows this scheme: -Sales (folder) - 2020-10-01 (folder) - File 1 - File 2 - 2020-10-02 (folder) - File 3 - File 4 To create a view and pull in all 4 files I ran the command: CREATE VIEW testview3 AS SELECT * FROM OPENROWSET (

Clear cache on AWS Redshift

大城市里の小女人 提交于 2020-12-04 18:15:28
问题 I am doing testing against AWS Redshift, and to replicate real world scenarios I need my test queries to not be cached so as not to give a false picture of performance. Is there any way for me to clear the Redshift cache between query runs? 回答1: I believe you can disable the cache for the testing sessions by setting the value enable_result_cache_for_session to off From the documentation If enable_result_cache_for_session is off, Amazon Redshift ignores the results cache and executes all

Clear cache on AWS Redshift

风流意气都作罢 提交于 2020-12-04 18:15:13
问题 I am doing testing against AWS Redshift, and to replicate real world scenarios I need my test queries to not be cached so as not to give a false picture of performance. Is there any way for me to clear the Redshift cache between query runs? 回答1: I believe you can disable the cache for the testing sessions by setting the value enable_result_cache_for_session to off From the documentation If enable_result_cache_for_session is off, Amazon Redshift ignores the results cache and executes all