After reading Amazon docs, my understanding is that the only way to run/test a Glue script is to deploy it to a dev endpoint and debug remotely if necessary. At the same time, i
Adding to CedricB,
For development / testing purpose, its not necessary to upload the code to S3, and you can setup a zeppelin notebook locally, have an SSH connection established so you can have access to the data catalog/crawlers,etc. and also the s3 bucket where your data resides.
After all the testing is completed, you can bundle your code, upload to an S3 bucket. Then create a Job pointing to the ETL script in S3 bucket, so that the job can be run, and scheduled as well. Once all the development/testing is completed, make sure to delete the dev endpoint, as we are charged even for the IDLE time.
Regards
I think the key here is to define what kind of testing do you want to do locally. If you are doing unit testing (i.e. testing just one pyspark script independent of the AWS services supporting that script) then sure you can do that locally. Use a mocking module like pytest-mock, monkeypatch or unittest to mock the AWS and Spark services external to your script while you test the logic that you have written in your pyspark script.
For module testing, you could you a workbook environment like AWS EMR Notebooks, Zeppelin or Jupyter. Here you would be able to run your Spark code against test datasources, but you can mock the AWS Services.
For integration testing (i.e. testing your code integrated with the services it depends on, but not a production system) you could launch a test instance of your system from your CI/CD pipeline and then have compute resources (like pytest scripts or AWS Lambda) automate the workflow implemented by your script.
I spoke to an AWS sales engineer and they said no, you can only test Glue code by running a Glue transform (in the cloud). He mentioned that there were testing out something called Outpost to allow on-prem operations, but that it wasn't publically available yet. So this seems like a solid "no" which is a shame because it otherwise seems pretty nice. But with out unit tests, its no-go for me.