Best Practices to Create and Download a huge ZIP (from several BLOBs) in a WebApp

后端 未结 3 1238
孤城傲影
孤城傲影 2020-11-27 05:10

I will need to perform a massive download of files from my Web Application.

It is obviously expected to be a long-running action (it\'ll be used once-per-year[-p

相关标签:
3条回答
  • 2020-11-27 05:31

    For large content that won't fit in memory at once, stream the content from the database to the response.

    This kind of thing is actually pretty simple. You don't need AJAX or websockets, it's possible to stream large file downloads through a simple link that the user clicks on. And modern browsers have decent download managers with their own progress bars - why reinvent the wheel?

    If writing a servlet from scratch for this, get access to the database BLOB, getting its input stream and copy content through to the HTTP response output stream. If you have Apache Commons IO library, you can use IOUtils.copy(), otherwise you can do this yourself.

    Creating a ZIP file on the fly can be done with a ZipOutputStream. Create one of these over the response output stream (from the servlet or whatever your framework gives you), then get each BLOB from the database, using putNextEntry() first and then streaming each BLOB as described before.

    Potential Pitfalls/Issues:

    • Depending on the download size and network speed, the request might take a lot of time to complete. Firewalls, etc. can get in the way of this and terminate the request early.
    • Hopefully your users are on a decent corporate network when requesting these files. It would be far worse over remote/dodgey/mobile connections (if it drops out after downloading 1.9G of 2.0G, users have to start again).
    • It can put a bit of load on your server, especially compressing huge ZIP files. It might be worth turning compression down/off when creating the ZipOutputStream if this is a problem.
    • ZIP files over 2GB (or is that 4 GB) might have issues with some ZIP programs. I think the latest Java 7 uses ZIP64 extensions, so this version of Java will write the huge ZIP correctly but will the clients have programs that support the large zip files? I've definitely run into issues with these before, especially on old Solaris servers
    0 讨论(0)
  • 2020-11-27 05:49

    Kick-off example of a totally dynamic ZIP file created by streaming each BLOB from the database directly to the client's File System.

    Tested with huge archives with the following performances:

    • Server disk space cost: 0 MegaBytes
    • Server RAM cost: ~ xx Megabytes. the memory consumption is not testable (or at least I don't know how to do it properly), because I got different, apparently random results from running the same routine multiple times (by using Runtime.getRuntime().freeMemory()) before, during and after the loop). However, the memory consumption is lower than using byte[], and that's enough.


    FileStreamDto.java using InputStream instead of byte[]

    public class FileStreamDto implements Serializable {
        @Getter @Setter private String filename;
        @Getter @Setter private InputStream inputStream; 
    }
    


    Java Servlet (or Struts2 Action)

    /* Read the amount of data to be streamed from Database to File System,
       summing the size of all Oracle's BLOB, PostgreSQL's ABYTE etc: 
       SELECT sum(length(my_blob_field)) FROM my_table WHERE my_conditions
    */          
    Long overallSize = getMyService().precalculateZipSize();
    
    // Tell the browser is a ZIP
    response.setContentType("application/zip"); 
    // Tell the browser the filename, and that it needs to be downloaded instead of opened
    response.addHeader("Content-Disposition", "attachment; filename=\"myArchive.zip\"");        
    // Tell the browser the overall size, so it can show a realistic progressbar
    response.setHeader("Content-Length", String.valueOf(overallSize));      
    
    ServletOutputStream sos = response.getOutputStream();       
    ZipOutputStream zos = new ZipOutputStream(sos);
    
    // Set-up a list of filenames to prevent duplicate entries
    HashSet<String> entries = new HashSet<String>();
    
    /* Read all the ID from the interested records in the database, 
       to query them later for the streams: 
       SELECT my_id FROM my_table WHERE my_conditions */           
    List<Long> allId = getMyService().loadAllId();
    
    for (Long currentId : allId){
        /* Load the record relative to the current ID:         
           SELECT my_filename, my_blob_field FROM my_table WHERE my_id = :currentId            
           Use resultset.getBinaryStream("my_blob_field") while mapping the BLOB column */
        FileStreamDto fileStream = getMyService().loadFileStream(currentId);
    
        // Create a zipEntry with a non-duplicate filename, and add it to the ZipOutputStream
        ZipEntry zipEntry = new ZipEntry(getUniqueFileName(entries,fileStream.getFilename()));
        zos.putNextEntry(zipEntry);
    
        // Use Apache Commons to transfer the InputStream from the DB to the OutputStream
        // on the File System; at this moment, your file is ALREADY being downloaded and growing
        IOUtils.copy(fileStream.getInputStream(), zos);
    
        zos.flush();
        zos.closeEntry();
    
        fileStream.getInputStream().close();                    
    }
    
    zos.close();
    sos.close();    
    


    Helper method for handling duplicate entries

    private String getUniqueFileName(HashSet<String> entries, String completeFileName){                         
        if (entries.contains(completeFileName)){                                                
            int extPos = completeFileName.lastIndexOf('.');
            String extension = extPos>0 ? completeFileName.substring(extPos) : "";          
            String partialFileName = extension.length()==0 ? completeFileName : completeFileName.substring(0,extPos);
            int x=1;
            while (entries.contains(completeFileName = partialFileName + "(" + x + ")" + extension))
                x++;
        } 
        entries.add(completeFileName);
        return completeFileName;
    }
    



    Thanks a lot @prunge for giving me the idea of the direct streaming.

    0 讨论(0)
  • 2020-11-27 05:57

    May be you want to try multiple downloads concurrently. I found a discussion related to this here - Java multithreaded file downloading performance

    Hope this helps.

    0 讨论(0)
提交回复
热议问题