问题
I am new to the HDFS system and I come across a HDFS question.
We have a HDFS file system, with the namenode on a server (with this server named as 0002) and datanode on two other servers (with these two severs named as 0004 and 0005 respectively). The original data comes from a Flume application and with the "Sink" in the Flume as HDFS. The Flume will write the original data (txt files) into the datanode on the servers 0004 and 0005.
So, the original data is replicated twice and saved under two servers. The system work well for some time until one day there is a power outage. When restarting the servers, the datanode servers (0004 and 0005) are restarted before the namenode (0002) server. In this case, the original data is still saved onto the 0004 and 0005 server, however the metadata information on the namenode(0002) is lost. The block information become corrupt. The question is how to fix the corrupt blocks without losing the original data?
For example, when we check on the namenode
hadoop fsck /wimp/contract-snapshot/year=2020/month=6/day=10/snapshottime=1055/contract-snapshot.1591779548475.csv -files -blocks -locations
We find the filename on the datanode, but the block is corrupt. The corresponding file name is:
blk_1090579409_16840906
When we go to the datanode (e.g. 0004) server, we can search the location of this file by
find ./ -name "*blk_1090579409*"
We have found the the file corresponding to the csv file under the virtual path of the HDFS system "/wimp/contract-snapshot/year=2020/month=6/day=10/snapshottime=1055/contract-snapshot.1591779548475.csv". The file is saved under folder: "./subdir0/subdir235/" and we can open it and find it is in the correct format. The corresponding .meta data is in binary form(?) and we can not read directly.
./subdir0/subdir235/blk_1090579409
The question is, given that we have found the original file (blk_1090579409), how could we restore the corrupt HDFS system using and without losing these correct original files?
回答1:
After some research, I find a solution, which may be not efficient but works. If someone comes with a better solution, please let me know.
The whole idea is to copy all the files from the HDFS, arrange these files by year/day/hour/minute into different folders and then update these folders onto HDFS.
I have two datanodes (0004 and 0005) where data is stored. The total data size is of the magnitude of 10+ terabytes. The folder structure is as following (it is the same as in the question, one displayed on linux and the other on Windows):
The replication factor is set as 2, which means (if no mistake happens) that each datanode will have one and only one copy of the original file. Therefore, we just scan the folders/files on one datanode (on server 0004, about 5+ terabytes). Based on the modification date and the timestep in each file, copy the files into a new folder on a backup server/driver. Luckily, in the original files, time step information is available, e.g. 2020-03-02T09:25
. I round the time to the nearest five minutes, and parent folder is for each day, with the newly created folder structure as:
The code of scan and copy the files from the datanode into the new folders by each five minutes are written in Pyspark and it takes about 2 days (I leave the code to run in the evening) to run all the operation.
Then I can update the folders on HDFS for each day. On HDFS, the folder structure is as follows:
The created folder is with the same structure as on the HDFS, also the naming convention is the same (in the copy step, I rename each copied files to match the convention on HDFS).
In the final step, I write JAVA code in order to do operations to the HDFS. After some testing, I am able to update the data of each day on HDFS. I.e. It will delete e.g. the data under the folder ~/year=2019/month=1/day=2/
on HDFS and then upload all the folders/files under the newly created folder ~/20190102/
up to ~/year=2019/month=1/day=2/
on HDFS. I do such operation for each day. Then the corrupt blocks disappear, and the right files are uploaded to the correct path on HDFS.
According to my research, it is also possible to find the corrupt blocks before the power outage by using the fsimage file on Hadoop, but this means that I may corrupt the blocks on HDFS after the power outage. Therefore, I decide using the described approach to delete the corrupt blocks while still keeping the original files and update them on HDFS.
If anyone has a better or more efficient solution, please share!
来源:https://stackoverflow.com/questions/65631178/fix-corrupt-hdfs-files-without-losing-data-files-in-the-datanode-still-exist