问题
I have a csv file in the format
IDATE_TIMESTAMP,OPEN,HIGH,LOW,CLOSE,VOLUME
1535535060,94.36,94.36,94.36,94.36,1
1535535120,94.36,94.36,93.8,93.8,1
1535535180,93.8,93.8,93.8,93.8,0
1535535240,93.8,93.8,93.74,93.74,1
1535535300,93.74,93.74,93.74,93.74,0
1535535360,93.74,93.74,93.74,93.74,0
1535535420,93.74,93.74,93.74,93.74,0
1535535480,93.74,93.74,93.74,93.74,0
1535535540,93.74,93.74,93.74,93.74,0
.
.
.
.
I have to and from timestamp which will filter out the data from the file and return the output. I am using python + boto3 for s3 select.
fromTs = "1535535480"
toTs = "1535535480"
query = """SELECT * FROM s3object s WHERE s."IDATE_TIMESTAMP" >= "%s" AND s."IDATE_TIMESTAMP" <= "%s" """%(fromTs, toTs)
request = client.select_object_content(
Bucket=bucket,
Key=filename,
ExpressionType="SQL",
Expression=query,
InputSerialization={"CSV":{"FileHeaderInfo":"Use", "FieldDelimiter":",", "RecordDelimiter":"\n"}},
OutputSerialization={"CSV":{}},
)
botocore.exceptions.ClientError: An error occurred (MissingHeaders) when calling the SelectObjectContent operation: Some headers in the query are missing from the file. Please check the file and try again.
This is error i am getting
回答1:
I know this is a bit late and might not be the solution to your issue, but I was having a similar one.
My issue turned out to be that I was attempting to perform an S3 Select on an object with UTF-8-BOM encoding, rather than just UTF-8. It turns out that the 3 byte BOM header was being interpreted as part of the first field in the CSV object, essentially corrupting the first column name.
As a result, rather than "IDATE_TIMESTAMP", the first column would be seen by the S3 Select call as "xxxIDATE_TIMESTAMP", causing an error when your expected column is "missing".
来源:https://stackoverflow.com/questions/57476465/how-to-fix-missingheaders-error-while-appending-where-clause-with-s3-select