Can I test AWS Glue code locally?

前端未结

关注

 9  1845

After reading Amazon docs, my understanding is that the only way to run/test a Glue script is to deploy it to a dev endpoint and debug remotely if necessary. At the same time, i

相关标签:

9条回答

Happy的楠姐

2021-02-01 04:06

If you are looking to run this in docker here is a link

Docker Hub : https://hub.docker.com/r/svajiraya/glue-dev-1.0

Git Repo for dockerfile
https://github.com/svajiraya/aws-glue-libs/blob/glue-1.0/Dockerfile

0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2021-02-01 04:07

You can keep glue and pyspark code in separate files and can unit-test pyspark code locally. For zipping dependency files, we wrote shell script which zips files and upload to s3 location and then applies CF template to deploy glue job. For detecting dependencies, we created (glue job)_dependency.txt file.

0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2021-02-01 04:12

Eventually, as of Aug 28, 2019, Amazon allows you to download the binaries and

develop, compile, debug, and single-step Glue ETL scripts and complex Spark applications in Scala and Python locally.

Check out this link: https://aws.amazon.com/about-aws/whats-new/2019/08/aws-glue-releases-binaries-of-glue-etl-libraries-for-glue-jobs/

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2021-02-01 04:14
You can do this as follows:
1. Install PySpark using
```
>> pip install pyspark==2.4.3
```
2. Prebuild AWS Glue-1.0 Jar with Python dependencies: Download_Prebuild_Glue_Jar
3. Copy the awsglue folder and Jar file into your pycharm project from github
4. Copy the Python code from my git repository
5. Run the following on your console; make sure to enter your own path:
```
>> python com/mypackage/pack/glue-spark-pycharm-example.py
```
From my own blog
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2021-02-01 04:16

Not that I know of, and if you have a lot of remote assets, it will be tricky. Using Windows, I normally run a development endpoint and a local zeppelin notebook while I am authoring my job. I shut it down each day.

You could use the job editor > script editor to edit, save, and run the job. Not sure of the cost difference.

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2021-02-01 04:22

There is now an official docker from AWS so that you can execute Glue locally: https://aws.amazon.com/blogs/big-data/building-an-aws-glue-etl-pipeline-locally-without-an-aws-account/

There's a nice step-by-step guide on that page as well

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页