aws-glue-data-catalog

Dynamic frame resolve choice specs , date cast

ⅰ亾dé卋堺 提交于 2020-08-26 13:42:21
问题 I am writing a Glue code and using dynamic frame Api resolve choice , specs . I am trying to cast the source by passing casting when dynamic frame is created from catalog. I have successfully implemented the casting via resolve choice specs but while casting date i am getting null values , just wanted to understand how can we pass date with source format in casting. self.df_TR01=self.df_TR01.resolveChoice(specs=[('col1', 'cast"string'), ('col2_date', 'cast:date')]).toDF() But in col2_date i

AWS Glue automatic job creation

試著忘記壹切 提交于 2020-03-03 10:12:10
问题 I have pyspark script which I can run in AWS GLUE. But everytime I am creating job from UI and copying my code to the job .Is there anyway I can automatically create job from my file in s3 bucket. (I have all the library and glue context which will be used while running ) 回答1: Another alternative is to use AWS CloudFormation. You can define all AWS resources you want to create (not only Glue jobs) in a template file and then update stack whenever you need from AWS Console or using cli.

AWS Glue Customized Crawler

帅比萌擦擦* 提交于 2020-01-16 12:02:59
问题 I've created an AWS Glue crawler to gather information on my Redshift Database. Is there a way I can customize this crawler to update the "comment" field in Glue with a field that all my tables have? This field would be the comment or description field that all Redshift tables have. Any help would be appreciated. Thanks 来源: https://stackoverflow.com/questions/59200724/aws-glue-customized-crawler

AWS Glue Customized Crawler

我们两清 提交于 2020-01-16 12:01:26
问题 I've created an AWS Glue crawler to gather information on my Redshift Database. Is there a way I can customize this crawler to update the "comment" field in Glue with a field that all my tables have? This field would be the comment or description field that all Redshift tables have. Any help would be appreciated. Thanks 来源: https://stackoverflow.com/questions/59200724/aws-glue-customized-crawler

How to solve this HIVE_PARTITION_SCHEMA_MISMATCH?

不羁的心 提交于 2020-01-06 07:01:45
问题 I have partitioned data in CSV files on S3: s3://bucket/dataset/p=1/*.csv (partition #1) ... s3://bucket/dataset/p=100/*.csv (partition #100) I run a classifier over s3://bucket/dataset/ and the result looks very much promising as it detects 150 columns (c1,...,c150) and assigns various data types. Loading the resulting table in Athena and querying ( select * from dataset limit 10 ) it though will yield the error message: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table

Issue with AWS Glue Data Catalog as Metastore for Spark SQL on EMR

自古美人都是妖i 提交于 2019-12-06 05:48:30
问题 I am having an AWS EMR cluster (v5.11.1) with Spark(v2.2.1) and trying to use AWS Glue Data Catalog as its metastore. As per guidelines provided in official AWS documentation (reference link below), I have followed the steps but I am facing some discrepancy with regards to accessing the Glue Catalog DB/Tables. Both EMR Cluster & AWS Glue are in the same account and appropriate IAM permissions have been provided. AWS Documentation : https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark

Issue with AWS Glue Data Catalog as Metastore for Spark SQL on EMR

僤鯓⒐⒋嵵緔 提交于 2019-12-04 11:16:23
I am having an AWS EMR cluster (v5.11.1) with Spark(v2.2.1) and trying to use AWS Glue Data Catalog as its metastore. As per guidelines provided in official AWS documentation (reference link below), I have followed the steps but I am facing some discrepancy with regards to accessing the Glue Catalog DB/Tables. Both EMR Cluster & AWS Glue are in the same account and appropriate IAM permissions have been provided. AWS Documentation : https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html Observations: - Using spark-shell (From EMR Master Node): Works . Able to access Glue DB