Evaluate javascript to plain text using C#, .NET 3.5

蓝咒 提交于 2019-12-23 01:54:51

问题


How can I evaluate's document.write javascript to plaintext in C#? I'm trying to evaluate this:

<script type="text/javascript">
a=2;b=3;
document.write(a+"_"+y);
</script>

to this:

2_3

回答1:


From your comment, "it's a client side function on a downloaded HTML page", it sounds like you are doing some sort of screen scraping / crawling, where the HTML/JavaScript are not making a client request to your app?

If I understand correctly that this is what you are seeking, then you need an interpreter that can "speak" JavaScript. C# cannot do this, so the next best thing is to fire up a component within your C# app which is capable of understanding/interpreting (and therefore evaluating) JavaScript.

I would recommend to look into the WebBrowser control and HtmlDocument.DomDocument, load your downloaded HTML page in to an HtmlDocument / WebBrowser control, it will run and it will include the result of the JavaScript function in the HTML (since document.write manipulates the DOM and resulting HTML).

If you create a simple forms app and drag a web browser control onto it, here is a sample I just wrote to test this theory out:

using System;
using System.Windows.Forms;
// Make sure to add COM reference to "Microsoft HTML Object Library" 

namespace TheAnswer
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            webBrowser1.Url = new Uri("about:blank");
        }


        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            MessageBox.Show("Loaded!");

            string testHtml = @"
                <html>
                    <head>
                        <script type=""text/javascript"">
                            var a=2;var b=3;
                            document.write(a+""_""+b);
                        </script>
                    </head>
                    <body>Hello there!</body>
                </html>";


            mshtml.IHTMLDocument2 htmlDoc = (mshtml.IHTMLDocument2)webBrowser1.Document.DomDocument; // IHTMLDocument2 has the write capability (IHTMLDocument3 does not)
            htmlDoc.close();
            htmlDoc.open("about:blank");

            object html = testHtml;
            htmlDoc.write(html);
            html = null;

        }

    }
}

Obviously from here, you can plug in your "downloaded" HTML into the HTML document and execute it; and you will likely run in to many snags along the way if you are dealing with a multitude of different types of pages, etc; if you are always scraping a similar type of page and are certain of some expected behaviors or javascript functions, then you may be able to achieve some results. It is really hard to say more considering the minimal amount of information you've provided regarding what your project is about.

I do hope this helps and is what you were trying to accomplish. Let me know!

EDIT: Wow I hadn't realized this question was 2 years old! anyway had some fun answering it!



来源:https://stackoverflow.com/questions/2530789/evaluate-javascript-to-plain-text-using-c-net-3-5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!