How to delete files from the HDFS?

后端 未结 6 913
滥情空心
滥情空心 2021-02-18 13:11

I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1. I adding some files by using the

hadoop fs -put /hw1/* /hw1
<         


        
相关标签:
6条回答
  • 2021-02-18 13:24

    Your problem is inside of the basis of HDFS. In HDFS (and in many other file systems) physical deleting of files isn't the fastest operations. As HDFS is distributed file system and usually replicate at least 3 replicas on different servers of the deleted file then each replica (which may consist of many blocks on different hard drives) must be deleted in the background after your request to delete the file.

    Official documentation of Hadoop tells us the following:

    The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.

    0 讨论(0)
  • 2021-02-18 13:27

    what works for me :

    hadoop fs -rmr -R <your Directory>
    
    0 讨论(0)
  • 2021-02-18 13:38

    Durga Viswanath Gadiraju is right it is question of time, maybe my PC is slow, and also uses VM, after 10 minutes files are physically deleted, if you are using the algorythm that used by me in the question. Note set up the fs.trash.interval parameter = 1. Or by default files won't be deleted faster than 6 hours.

    0 讨论(0)
  • 2021-02-18 13:44

    If you also need to skip trash following command works for me

    hdfs dfs -rm -R -skipTrash /path/to/HDFS/file
    
    0 讨论(0)
  • 2021-02-18 13:46

    You can use

    hdfs dfs -rm -R /path/to/HDFS/file
    

    since hadoop dfs has been deprecated.

    0 讨论(0)
  • 2021-02-18 13:47

    Try hadoop fs -rm -R URI

    -R option deletes the directory and any content under it recursively.

    0 讨论(0)
提交回复
热议问题