问题
I want to delete the partition folders which are older than N days.
The below command gives the folders which are exactly 50 days ago. I want the list of all folders which are less than 50 days
hadoop fs -ls /data/publish/DMPD/VMCP/staging/tvmcpr_usr_prof/chgdt=`date --date '50 days ago' +\%Y-\%m-\%d`
回答1:
You can try with solr hdfsfindtool:
hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool -find /data/publish/DMPD/VMCP/staging/tvmcpr_usr_prof -mtime +50 | xargs hdfs dfs -rm -r -skipTrash
回答2:
It can be done with a bash
script
today=`date +'%s'`
hdfs dfs -ls /data/publish/DMPD/VMCP/staging/tvmcpr_usr_prof/ | grep "^d" | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')
if [ ${difference} -lt 50 ]; then
echo "${filepath}"
fi
done
来源:https://stackoverflow.com/questions/43889792/delete-partitions-folders-in-hdfs-older-than-n-days