问题
I have uploaded data from Redshift to S3 in Parquet format and created the data catalog in Glue. I have been able to query the table from Athena but when I create the external schema on Redshift and tried to query on the table I'm getting the below error
ERROR: S3 Query Exception (Fetch)
DETAIL:
-----------------------------------------------
error: S3 Query Exception (Fetch)
code: 15001
context: Task failed due to an internal error. File 'https://s3-eu-west-1.amazonaws.com/bucket/folder/partition_key/filename.parquet_1 has an incompatible Parquet schema for column 's3://bucket/folder
query: 560922
location: dory_util.cpp:717
process: query1_118_560922 [pid=32409]
-----------------------------------------------
The queries are workinh well in Athena
回答1:
It kind of tells you what's wrong - the schema of table/partition and the file contents differ too much. The easiest way to fix that would be to run a crawler over the data location with the "update each partition definition from table" checked.
回答2:
I've run into this before as well. Athena does not seem to have as strict checking on the file schema's as Redshift does.
Every single parquet files has a schema definition in it. If the schema definition in the file does not match the table definition or differs from one or more of the other files, Redshift queries will fail while Athena queries may succeed if the affected columns are not in the query.
来源:https://stackoverflow.com/questions/51133599/s3-query-exception-fetch