Launching wkhtmltopdf from Runtime.getRuntime().exec(): never terminates?

五迷三道 提交于 2019-12-19 06:17:07

问题


I'm launching wkhtmltopdf from within my Java app (part of a Tomcat server, running in debug mode within Eclipse Helios on Win7 64-bit): I'd like to wait for it to complete, then Do More Stuff.

String cmd[] = {"wkhtmltopdf", htmlPathIn, pdfPathOut};
Process proc = Runtime.getRuntime().exec( cmd, null );

proc.waitFor();

But waitFor() never returns. I can still see the process in the Windows Task Manager (with the command line I passed to exec(): looks fine). AND IT WORKS. wkhtmltopdf produces the PDF I'd expect, right where I'd expect it. I can open it, rename it, whatever, even while the process is still running (before I manually terminate it).

From the command line, everything is fine:

c:\wrk>wkhtmltopdf C:\Temp\foo.html c:\wrk\foo.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

The process exits just fine, and life goes on.

So what is it about runtime.exec() that's causing wkhtmltopdf to never terminate?

I could grab proc.getInputStream() and look for "Done", but that's... vile. I want something that is more general.

I've calling exec() with and without a working directory. I've tried with and without an empty "env" array. No joy.

Why is my process hanging, and what can I do to fix it?

PS: I've tried this with a couple other command line apps, and they both exhibit the same behavior.

Further exec woes.

I'm trying to read standard out & error, without success. From the command line, I know there's supposed to be something remarkably like my command line experience, but when I read the input stream returned by proc.getInputStream(), I immediately get an EOL (-1, I'm using inputStream.read()).

I checked the JavaDoc for Process, and found this

The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the [b]subprocess to block, and even deadlock[/b].

Emphasis added. So I tried that. The first 'read()' on the Standard Out inputStream blocked until I killed the process...

WITH WKHTMLTOPDF

With the generic command line ap & no params so it should "dump usage and terminate", it sucks out the appropriate std::out, then terminates.

Interesting!

JVM version issue? I'm using 1.6.0_23. The latest is... v24. I just checked the change log and don't see anything promising, but I'll try updating anyway.


Okay. Don't let the Input Streams fill or they'll block. Check. .close() can also prevent this, but isn't terribly bright.

That works in general (including the generic command line apps I've tested).

In specific however, it falls down. It appears that wkhtmltopdf is using some terminal manipulation/cursor stuff to do an ASCII-graphic progress bar. I believe this is causing the inputStream to immediately return EOF rather than giving me the correct values.

Any ideas? Hardly a deal-breaker, but it would definitely be Nice To Have.


回答1:


A process has 3 streams: input, output and error. you can read both output and error stream at the same time using separate processes. see this question and its accepted answer and also this one for example.




回答2:


I had the same exact issue as you and I solved it. Here are my findings:

For some reason, the output from wkhtmltopdf goes to STDERR of the process and NOT STDOUT. I have verified this by calling wkhtmltopdf from Java as well as perl

So, for example in java, you would have to do:

//ProcessBuilder is the recommended way of creating processes since Java 1.5 
//Runtime.getRuntime().exec() is deprecated. Do not use. 
ProcessBuilder pb = new ProcessBuilder("wkhtmltopdf.exe", htmlFilePath, pdfFilePath);
Process process = pb.start();

BufferedReader errStreamReader = new BufferedReader(new  InputStreamReader(process.getErrorStream())); 
//not "process.getInputStream()" 
String line = errStreamReader.readLine(); 
while(line != null) 
{ 
    System.out.println(line); //or whatever else
    line = reader.readLine(); 
}

On a side note, if you spawn a process from java, you MUST read from the stdout and stderr streams (even if you do nothing with it) because otherwise the stream buffer will fill and the process will hang and never return.

To futureproof your code, just in case the devs of wkhtmltopdf decide to write to stdout, you can redirect stderr of the child process to stdout and read only one stream like this:

ProcessBuilder pb = new ProcessBuilder("wkhtmltopdf.exe", htmlFilePath, pdfFilePath); 
pb.redirectErrorStream(true); 
Process process = pb.start(); 
BufferedReader inStreamReader = new BufferedReader(new  InputStreamReader(process.getInputStream())); 

Actually, I do this in all the cases where I have to spawn an external process from java. That way I don't have to read two streams.

You should also read the streams of the spawned process in different threads if you dont want your main thread to block, since reading from streams is blocking.

Hope this helps.

UPDATE: I raised this issue in the project page and was replied that this is by design because wkhtmltopdf supports giving the actual pdf output in STDOUT. Please see the link for more details and java code.




回答3:


You should read from the streams in a different thread.




回答4:


    final Semaphore semaphore = new Semaphore(numOfThreads);
    final String whktmlExe = tmpwhktmlExePath;
    int doccount = 0;
    try{
        File fileObject = new File(inputDir);
        for(final File f : fileObject.listFiles()) {

            if(f.getAbsolutePath().endsWith(".html")) {
                doccount ++;
                if(doccount >500 ) {
                    LOG.info(" done with conversion of 1000 docs exiting ");
                    break;
                }
                System.out.println(" inside for before "+semaphore.availablePermits());
                semaphore.acquire();
                System.out.println(" inside for after "+semaphore.availablePermits() + " ---" +f.getName());
                new java.lang.Thread() {
                    public void run() {
                        try {
                            String F_ =  f.getName().replaceAll(".html", ".pdf") ;
                            ProcessBuilder pb = new ProcessBuilder(whktmlExe , f.getAbsolutePath(), outPutDir + F_ .replaceAll(" ", "_") );//"wkhtmltopdf.exe", htmlFilePath, pdfFilePath);
                            pb.redirectErrorStream(true);
                            Process process = pb.start();
                            BufferedReader errStreamReader = new BufferedReader(new  InputStreamReader(process.getInputStream()));  
                            String line = errStreamReader.readLine(); 
                            while(line != null) 
                            { 
                                System.err.println(line); //or whatever else
                                line = errStreamReader.readLine(); 
                            }

                            System.out.println("after completion for ");
                        } catch (Exception e) {
                            e.printStackTrace();
                        }finally {
                            System.out.println(" in finally releasing ");
                        semaphore.release();
                        }
                  }
                }.start();
            }
        }
    }catch (Exception ex) {
        LOG.error(" *** Error in pdf generation *** ", ex);
    }

    while (semaphore.availablePermits() < numOfThreads) {//till all threads finish 
        LOG.info( " Waiting for all threads to exit "+ semaphore.availablePermits() + " --- " +( numOfThreads - semaphore.availablePermits()));
        java.lang.Thread.sleep(10000);
    }


来源:https://stackoverflow.com/questions/5506275/launching-wkhtmltopdf-from-runtime-getruntime-exec-never-terminates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!