问题
From a long time, I have observed that Hadoop framework set a checkpoint on the trash current directory irrespective of a time interval whereas permanently deletes the file/directory within the specified deletion interval after creating the automatic checkpoint. Here is what, I have tested:
vi core-site.xml
<property>
<name>fs.trash.interval</name>
<value>5</value>
</property>
hdfs dfs -put LICENSE.txt /
hdfs dfs -rm /LICENSE.txt
fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 5 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://hacluster/LICENSE.txt' to trash at: hdfs://hacluster/user/hduser/.Trash/Current
hdfs dfs -ls -R /user/hduser
/user/hduser/.Trash/Current
/user/hduser/.Trash/Current/LICENSE.txt
After sometime:
/user/hduser/.Trash/160229140000
/user/hduser/.Trash/160229140000/LICENSE.txt
I have created a sample bash script to track at what point does Hadoop set the "Current" directory to checkpoint and after checkpoint does it delete in the specified trash interval:
Trash Interval: 5 minutes
- Automatic checkpoint - Approx 30 seconds
- Permanent deletion - Approx 5 minutes
Trash Interval: 10 minutes
- Automatic checkpoint - Approx 90 seconds
- Permanent deletion - Approx 10 minutes
Trash Interval: 15 minutes
- Automatic checkpoint - Approx 630 seconds
- Permanent deletion - Approx 15 minutes
Trash Interval: 20 minutes
- Automatic checkpoint - Approx 1080 seconds
- Permanent deletion - Approx 20 minutes
Trash Interval - 20 minutes (Expunge - Manual checkpoint)
hdfs dfs -expunge
- Manual checkpoint - at once
- Permanent deletion - Approx 20 minutes
Can anyone help in understanding when does Hadoop creates this checkpoint. What mechanism has been adopted to create the trash checkpoint? If it is resource availability then my test environment has zero overhead during this test.
来源:https://stackoverflow.com/questions/35698854/when-does-hadoop-framework-creates-a-checkpoint-expunge-to-its-current-direc