wkhtmltopdf relative paths in HTML with redirected in/out streams won't work

笑着哭i 提交于 2019-12-07 03:18:25

问题


I am using wkhtmltopdf.exe (version 0.12.0 final) to generate pdf files from html files, I do this with .NET C#

My problem is getting javascript, stylesheets and images to work by only specifying relative paths in the html. Right now I have it working if I use absolute paths. But it doesn't work with relative paths, which makes the whole html generation a bit to complicated. I have boiled what I do down to the following example:

string CMDPATH = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe";
string HTML = string.Format(
    "<div><img src=\"{0}\" /></div><div><img src=\"{1}\" /></div><div>{2}</div>",
    "./sohlogo.png",
    "./ACLASS.jpg",
    DateTime.Now.ToString());

WriteFile(HTML, "test.html");

Process p;
ProcessStartInfo psi = new ProcessStartInfo();

psi.FileName = CMDPATH;
psi.UseShellExecute = false;
psi.WorkingDirectory = AppDomain.CurrentDomain.BaseDirectory;
psi.CreateNoWindow = true;
psi.RedirectStandardInput = true;
psi.RedirectStandardOutput = true;
psi.RedirectStandardError = true;

psi.Arguments = "-q - -";

p = Process.Start(psi);

StreamWriter stdin = p.StandardInput;
stdin.AutoFlush = true;
stdin.Write(HTML);
stdin.Dispose();

MemoryStream pdfstream = new MemoryStream();
CopyStream(p.StandardOutput.BaseStream, pdfstream);
p.StandardOutput.Close();
pdfstream.Position = 0;

WriteFile(pdfstream, "test.pdf");

p.WaitForExit(10000);
int test = p.ExitCode;

p.Dispose();

I have tried relative paths like: "./sohlogo.png" and simply "sohlogo.png" both displays correctly in the browser via the html file. But none of them work in the pdf file. There is no data in the error stream.

The following commandline works like a charm with the relative paths:

"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" test.html test.pdf

I could really need some input at this stage. So any help is much appreciated!

Just for reference the WriteFile and CopyStream methods looks like this:

public static void WriteFile(MemoryStream stream, string path)
{
    using (FileStream writer = new FileStream(path, FileMode.Create))
    {
        byte[] bytes = stream.ToArray();
        writer.Write(bytes, 0, bytes.Length);
        writer.Flush();
    }
}

public static void WriteFile(string text, string path)
{
    using (StreamWriter writer = new StreamWriter(path))
    {
        writer.WriteLine(text);
        writer.Flush();
    }
}

public static void CopyStream(Stream input, Stream output)
{
    byte[] buffer = new byte[32768];
    int read;
    while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
    {
        output.Write(buffer, 0, read);
    }
}

EDIT: My Workaround for Neo Nguyen.

I could not get this to work with relative paths. So what I did instead was a method that prepends all paths with a root path. It solves my problem so maybe it will solve yours:

/// <summary>
/// Prepends the basedir x in src="x" or href="x" to the input html text
/// </summary>
/// <param name="html">the initial html</param>
/// <param name="basedir">the basedir to prepend</param>
/// <returns>the new html</returns>
public static string MakeRelativePathsAbsolute(string html, string basedir)
{
    string pathpattern = "(?:href=[\"']|src=[\"'])(.*?)[\"']";

    // SM20140214: tested that both chrome and wkhtmltopdf.exe understands "C:\Dir\..\image.png" and "C:\Dir\.\image.png"
    //             Path.Combine("C:/
    html = Regex.Replace(html, pathpattern, new MatchEvaluator((match) =>
        {
            string newpath = UrlEncode(Path.Combine(basedir, match.Groups[1].Value));
            if (!string.IsNullOrEmpty(match.Groups[1].Value))
            {
                string result = match.Groups[0].Value.Replace(match.Groups[1].Value, newpath);
                return result;
            }
            else
            {
                return UrlEncode(match.Groups[0].Value);
            }
        }));

    return html;
}

private static string UrlEncode(string url)
{
    url = url.Replace(" ", "%20").Replace("#", "%23");
    return url;
}

I tried different System.Uri.Escape*** methods like System.Uri.EscapeDataString(). But they ended up doing to severe url encoding for wkhtmltopdf to understand it. Because of lack of time I just did the quick and dirty UrlEncode above.


回答1:


Looking quickly, I think the trouble might be with

psi.WorkingDirectory = AppDomain.CurrentDomain.BaseDirectory;

I think that is where the paths are pointing at. I'm assuming that

"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" test.html test.pdf

working means that your image referenced inside test.html as src="mlp.png" is at c:\Program Files\wkhtmltopdf\bin\mlp.png, right? I think that it works because your image file is in the same folder as wkhtmltopdf... so try setting the WorkingDirectory to that directory and see what happens.



来源:https://stackoverflow.com/questions/21775572/wkhtmltopdf-relative-paths-in-html-with-redirected-in-out-streams-wont-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!