Spark Scala list folders in directory

前端 未结 9 2331
北恋
北恋 2020-12-05 09:41

I want to list all folders within a hdfs directory using Scala/Spark. In Hadoop I can do this by using the command: hadoop fs -ls hdfs://sandbox.hortonworks.com/demo/<

相关标签:
9条回答
  • 2020-12-05 10:21
    object HDFSProgram extends App {    
      val uri = new URI("hdfs://HOSTNAME:PORT")    
      val fs = FileSystem.get(uri,new Configuration())    
      val filePath = new Path("/user/hive/")    
      val status = fs.listStatus(filePath)    
      status.map(sts => sts.getPath).foreach(println)    
    }
    

    This is sample code to get list of hdfs files or folder present under /user/hive/

    0 讨论(0)
  • 2020-12-05 10:22
    val spark = SparkSession.builder().appName("Demo").getOrCreate()
    val path = new Path("enter your directory path")
    val fs:FileSystem = projects.getFileSystem(spark.sparkContext.hadoopConfiguration)
    val it = fs.listLocatedStatus(path)
    

    This will create an iterator it over org.apache.hadoop.fs.LocatedFileStatus that is your subdirectory

    0 讨论(0)
  • 2020-12-05 10:32

    We are using hadoop 1.4 and it doesn't have listFiles method so we use listStatus to get directories. It doesn't have recursive option but it is easy to manage recursive lookup.

    val fs = FileSystem.get(new Configuration())
    val status = fs.listStatus(new Path(YOUR_HDFS_PATH))
    status.foreach(x=> println(x.getPath))
    
    0 讨论(0)
提交回复
热议问题