Transfer file out from HDFS

前端未结

关注

 5  1734

I want to transfer files out from HDFS to local filesystem of a different server which is not in hadoop cluster but in the network.

I could have done:


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2021-02-01 21:41
              
            
            
                                                                       
This is the simplest way to do it:

ssh <YOUR_HADOOP_GATEWAY> "hdfs dfs -cat <src_in_HDFS> " > <local_dst>


It works for binary files too.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-02-01 21:45
              
            
            
                                                                       
So you probably have a file with a bunch of parts as the output from your hadoop program.

part-r-00000
part-r-00001
part-r-00002
part-r-00003
part-r-00004


So lets do one part at a time?

for i in `seq 0 4`;
do
hadoop fs -copyToLocal output/part-r-0000$i ./
scp ./part-r-0000$i you@somewhere:/home/you/
rm ./part-r-0000$i
done


You may have to lookup the password modifier for scp
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2021-02-01 21:56
              
            
            
                                                                       
You could make use of webHDFS REST API to do that. Do a curl from the machine where you want to download the files.

curl -i -L "http://namenode:50075/webhdfs/v1/path_of_the_file?op=OPEN" -o ~/destination


Another approach could be to use the DataNode API through wget to do this :

wget http://$datanode:50075/streamFile/path_of_the_file


But, the most convenient way, IMHO, would be to use the NameNOde webUI. Since this machine is part of the network, you could just point your web browser to NameNode_Machine:50070. After that browse through the HDFS, open the file you want to download and click Download this file.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2021-02-01 21:57
              
            
            
                                                                       
I think simplest solution would be network mount or SSHFS to simulate local file server directory locally. 

You also can mount FTP as a local directory: 
http://www.linuxnix.com/2011/03/mount-ftp-server-linux.html 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2021-02-01 21:58
              
            
            
                                                                       
I was trying to do this too (I was using Kerberos security). This helped me after small update: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#OPEN

Run directly curl -L -i --negotiate "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN" didn't worked for me, I'll explain why.

This command will do two steps: 


find a file you want to download and create a temporary link - return 307 Temporary Redirect
from this link he will download a data - return HTTP 200 OK.


The switcher -L is saying that he take a file and continue with sawing directly. If you add to curl command -v, it'll log to output; if so, you'll see described two steps in command line, as I said. BUT - because due to older version curl (which I cannot udpate) it won't work. 

SOLUTION FOR THIS (in Shell):

LOCATION=`curl -i --negotiate -u : "${FILE_PATH_FOR_DOWNLOAD}?op=OPEN" | /usr/bin/perl -n -e '/^Location: (.*)$/ && print "$1\n"'`


This will get temporary link and save it to $LOCATION variable.

RESULT=`curl -v -L --negotiate -u : "${LOCATION}" -o ${LOCAL_FILE_PATH_FOR_DOWNLOAD}`


And this will save it to your local file, if you add -o <file-path>.

I hope it helped.

J.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复