Is it possible to save files in Hadoop without saving them in local file system?

后端 未结 3 792
滥情空心
滥情空心 2021-01-15 17:30

Is it possible to save files in Hadoop without saving them in local file system? I would like to do something like shown below however I would like to save file directly in

3条回答
  •  囚心锁ツ
    2021-01-15 17:46

    Here's how to download a file directly to HDFS with Pydoop:

    import os
    import requests
    import pydoop.hdfs as hdfs
    
    
    def dl_to_hdfs(url, hdfs_path):
        r = requests.get(url, stream=True)
        with hdfs.open(hdfs_path, 'w') as f:
            for chunk in r.iter_content(chunk_size=1024):
                f.write(chunk)
    
    
    URL = "https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz"
    dl_to_hdfs(URL, os.path.basename(URL))
    

    The above snippet works for a generic URL. If you already have the file as a Django UploadedFile, you can probably use its .chunks method to iterate through the data.

提交回复
热议问题