I\'m trying to send data from the workers of a Pyspark RDD to an SQS queue, using boto3 to talk with AWS. I need to send data directly from the partitions, rather than colle
This is because you have the boto3 bundle as a zip file.
"./rebuilt.zip/boto3"
What boto3 does for initialisation is it will download a bunch files and save it inside the distribution folder. Because your boto3 lives in a zip package, so obviously those files won't be able to it to there.
Solution is, rather then distribute boto3 inside a zip, you should have boto3 installed on your Spark environment. Be careful here, you might want to install boto3 both on the master node and worker nodes, depends on how you implement your app. Safe bet is install on both.
If you are using EMR, you can use bootstrap step to do it.