Use Spark to list all files in a Hadoop HDFS directory?

前端 未结 2 988
执念已碎
执念已碎 2021-02-13 13:19

I want to loop through all text files in a Hadoop dir and count all the occurrences of the word \"error\". Is there a way to do a hadoop fs -ls /users/ubuntu/ to li

2条回答
  •  忘了有多久
    2021-02-13 14:03

    You can use a wildcard:

    val errorCount = sc.textFile("hdfs://some-directory/*")
                       .flatMap(_.split(" ")).filter(_ == "error").count
    

提交回复
热议问题