Deploying pyspark CommonCrawl repo to EMR

前端 未结 0 502
谎友^
谎友^ 2021-01-27 09:55

I\'m trying to extract WET files from the public CommonCrawl data hosted on S3 from my EMR cluster. To do this, CommonCrawl has a cc-pyspark repo where they provide examples and

相关标签:
回答
  • 消灭零回复
提交回复
热议问题