Hadoop file write

后端 未结 1 339
隐瞒了意图╮
隐瞒了意图╮ 2020-12-12 01:40

Referring to Tom White\'s book Hadoop definitive guide ..... My question (assuming replication factor 3 and data being written to node D1,D2,D3) If I understand correctly,

相关标签:
1条回答
  • 2020-12-12 02:23

    Two answer your question, I would like to highlight one point. Either read or write operations have been initiated by Client ( HDFS Client).

    Have a look at this diagram.

    In entire process, client is either reading/writing from/to data nodes directly and not through NameNode. NameNode will just send the list of datanodes to be contacted for read or write operation.

    Coming back to your query,

    "any packets in the ack queue are added to the front of the data queue so that datanodes that are downstream from the failed node will not miss any packets"

    After this line, you can find below line

    The current block on the good datanodes is given a new identity, which is communicated to the namenode, so that the partial block on the failed datanode will be deleted if the failed datanode recovers later on. The failed datanode is removed from the pipeline, and a new pipeline is constructed from the two good datanodes.

    The above point will answer your first query : 1. Block getting new identity

    1. Who gives this new identity: Even though it's not explicit, we can conclude that HDFSClient is responsbile to provide new identity and inform the NameNode about new identity.

    2. Why is it needed ?

    Since only partial data is written on problematic datanode, we have to remove this block of data completely. Same was explained in next set of lines in the book.

    The current block on the good datanodes is given a new identity, which is communicated to the namenode, so that the partial block on the failed datanode will be deleted if the failed datanode recovers later on.

    0 讨论(0)
提交回复
热议问题