Java PipedInputStream PipedOutputStream Size limitation

问题

I am using java pipeline to pass the data (outstream) from an unzip module (JavaUncompress class) to a parsing module (handler class), the file is large, I want to unzip the file first and parse directly instead of saving the unzipped file and then parse. However, it only works for file of small size. When I input an 1G file, it seems only part of the file (say 50000 lines) are piplined from the outstream to the inputstream of the parsing module.

I tried to use a String to save the uncompressed file, and the same thing happened, the String only contains part of the unzipped file (stopped at the same 50000th line as the piplined version). Is there any idea about what happened? Thank you very much.

Here is my code for pipeline:

   {
   PipedInputStream in = new PipedInputStream(); // to output
   final PipedOutputStream out = new PipedOutputStream(in); // out is something from other

   new Thread(
    new Runnable(){
        public void run(){ 
                JavaUncompress.putDataOnOutputStream(inFile,out); }
        }
        ).start();

   doc = handler.processDataFromInputStream(in);
   }

   public static void putDataOnOutputStream(String inZipFileName, PipedOutputStream out){

   try {
          FileInputStream fis = new FileInputStream(inZipFileName);
          //FilterInputStream ftis = new FilterInputStream;
          ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
          ZipEntry entry;

          while((entry = zis.getNextEntry()) != null) {
             System.out.println("Extracting: " +entry);
             byte data[] = new byte[BUFFER];

             long len = entry.getSize();
             long blk = len/BUFFER;
             int  rem = (int)(len - blk*BUFFER);
             System.out.println(len+" = "+blk +"*BUFFER + "+rem);

             for(long i=0; i!=blk; ++i){
                 if ((zis.read(data, 0, BUFFER)) != -1) {
                     out.write(data);
                 }
             }

             byte dataRem[] = new byte[rem];
             if ((zis.read(dataRem, 0, rem)) != -1) {
                 out.write(dataRem);
                 out.flush();
                 out.close();
             }

          }
          zis.close();

       } catch(Exception e) {
          e.printStackTrace();
       }
   }

回答1:

PipedOutputStream.write() will block if the corresponding PipedInputStream gets more than 4096 or whatever bytes behind it, but why do this at all? Why not just unzip the file and process it in the same thread? There's no advantage to multi-threading it, it's just a pointless complication.

I've used pipes exactly once in 15 years in Java and I pretty quickly changed it to a queue.

回答2:

I concur with no using the Pipes implementation from JDK, they are too confusing and full of synchronization, here is a faster implemenation with a BlockingQueue which with the aid of a small buffer will have a minimum impact for context switching, a Blocking queue is perfect for a single producer/consumer:

import java.io.IOException;
import java.io.OutputStream;
import java.util.concurrent.*;

public class QueueOutputStream extends OutputStream
{
  private static final int DEFAULT_BUFFER_SIZE=1024;
  private static final byte[] END_SIGNAL=new byte[]{};

  private final BlockingQueue<byte[]> queue=new LinkedBlockingDeque<>();
  private final byte[] buffer;

  private boolean closed=false;
  private int count=0;

  public QueueOutputStream()
  {
    this(DEFAULT_BUFFER_SIZE);
  }

  public QueueOutputStream(final int bufferSize)
  {
    if(bufferSize<=0){
      throw new IllegalArgumentException("Buffer size <= 0");
    }
    this.buffer=new byte[bufferSize];
  }

  private synchronized void flushBuffer()
  {
    if(count>0){
      final byte[] copy=new byte[count];
      System.arraycopy(buffer,0,copy,0,count);
      queue.offer(copy);
      count=0;
    }
  }

  @Override
  public synchronized void write(final int b) throws IOException
  {
    if(closed){
      throw new IllegalStateException("Stream is closed");
    }
    if(count>=buffer.length){
      flushBuffer();
    }
    buffer[count++]=(byte)b;
  }

  @Override
  public synchronized void write(final byte[] b, final int off, final int len) throws IOException
  {
    super.write(b,off,len);
  }

  @Override
  public synchronized void close() throws IOException
  {
    flushBuffer();
    queue.offer(END_SIGNAL);
    closed=true;
  }

  public Future<Void> asyncSendToOutputStream(final ExecutorService executor, final OutputStream outputStream)
  {
    return executor.submit(
            new Callable<Void>()
            {
              @Override
              public Void call() throws Exception
              {
                try{
                  byte[] buffer=queue.take();
                  while(buffer!=END_SIGNAL){
                    outputStream.write(buffer);
                    buffer=queue.take();
                  }
                  outputStream.flush();
                } catch(Exception e){
                  close();
                  throw e;
                } finally{
                  outputStream.close();
                }
                return null;
              }
            }
    );
  }

来源：https://stackoverflow.com/questions/11400604/java-pipedinputstream-pipedoutputstream-size-limitation

标签

java

pipeline