Save embedded pdf from website

对着背影说爱祢 提交于 2019-12-12 22:18:04

问题


I am writing a small C# application to manage our Safety data Sheets (of chemicals) from our suppliers.

At the moment I manually search for the chemical and save the pdf and add a link to the pdf in my program. The problem is I still have many chemical to go so it would be better to automate the process.

For example: A chemical has the following part number: 271004

The link containing the pdf is here:

Link

I have been reading the page source but cannot find a link to the pdf

But my knowledge of html/javascript is to limited at the moment.....

Is there any way to extract the pdf from the website?

Thanks in advance for any advice :)


回答1:


Look in the page for an iframe element with id "msdsPageFrame". The src attribute of that element contains the url to your PDF. Download that url.

If you have questions about how to download an URL or how to parse a page in search for an id, ask another question.




回答2:


Now I am able to access the pdf file direct using an product code:

www.sigmaaldrich.com/MSDS/MSDS/DisplayMSDSPage.do?country=NL&language=EN-generic‌​&productNumber=271004&brand=SIAL&PageToGoToURL=null

Using the following code I try to download the pdf:

        private void Download()
    {
        webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);                   // Uses the Event Handler to check whether the download is complete
        webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);  // Uses the Event Handler to check for progress made
        webClient.DownloadFileAsync(new Uri("http://www.sigmaaldrich.com/MSDS/MSDS/DisplayMSDSPage.do?country=NL&language=EN-generic&productNumber=271004&brand=SIAL&PageToGoToURL=null"), @"C:\Users\test\Downloads\newfile.pdf");           // Defines the URL and destination directory for the downloaded file
    }

    private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)
    {
        Debug.WriteLine("DownloadProgressChangedEventHandler");
    }

    private void Completed(object sender, AsyncCompletedEventArgs e)
    {
        Debug.WriteLine("AsyncCompletedEventHandler");
    }

However this does not work. The problem is that the pdf is first generated (takes a few seconds). However, the AsyncCompletedEventHandler is triggered right away. I think this is the problem why the pdf file is not downloaded.




回答3:


For those using Mozilla, put the mouse pointer to anywhere within the PDF area and press control+s. Doing so will download the PDF.



来源:https://stackoverflow.com/questions/26230485/save-embedded-pdf-from-website

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!