How to unzip files stored in HDFS using Java, without first copying to the local file system?

后端 未结 1 1747
鱼传尺愫
鱼传尺愫 2021-01-06 17:55

We are storing zip files, containing XML files, in HDFS. We need to be able to programmatically unzip the file and stream out the contained XML files, using Java. FileSystem

相关标签:
1条回答
  • 2021-01-06 18:07

    Hi Please find the sample code,

    public static Map<String, byte[]> loadZipFileData(String hdfsFilePath) {
                try {
                    ZipInputStream zipInputStream = readZipFileFromHDFS(new Path(hdfsFilePath));
                    ZipEntry zipEntry = null;
                    byte[] buf = new byte[1024];
                    Map<String, byte[]> listOfFiles = new LinkedHashMap<>();
                    while ((zipEntry = zipInputStream.getNextEntry()) != null ) {
                        int bytesRead = 0;
                        String entryName = zipEntry.getName();
                        if (!zipEntry.isDirectory()) {
                            ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
                            while ((bytesRead = zipInputStream.read(buf, 0, 1024)) > -1) {
                                outputStream.write(buf, 0, bytesRead);
                            }
                            listOfFiles.put(entryName, outputStream.toByteArray());
                            outputStream.close();
                        }
                        zipInputStream.closeEntry();
                    }
                    zipInputStream.close();
                    return listOfFiles;
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
    
    
    
    protected ZipInputStream readZipFileFromHDFS(FileSystem fileSystem, Path path) throws Exception {
        if (!fileSystem.exists(path)) {
            throw new IllegalArgumentException(path.getName() + " does not exist");
        }
        FSDataInputStream fsInputStream = fileSystem.open(path);
        ZipInputStream zipInputStream = new ZipInputStream(fsInputStream);
        return zipInputStream;
    }
    
    0 讨论(0)
提交回复
热议问题