Windows Batch / parse data from html web page

前端 未结 2 919
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-07 06:28

Is it possible to parse data from web html page using windows batch?

let\'s say I have a web page: www.domain.com/data/page/1 Page source html:

...
&         


        
2条回答
  •  隐瞒了意图╮
    2021-01-07 06:56

    It's better to parse structured markup as a hierarchical object, rather than scraping as flat text. That way you aren't so dependent upon the formatting of the data you're parsing (whether it's minified, spacing has changed, whatever).

    The batch language isn't terribly well-suited to parse markup language like HTML, XML, JSON, etc. In such cases, it can be extremely helpful to use a hybrid script and borrow from JScript or PowerShell methods to scrape the data you need. Here's an example demonstrating a batch + JScript hybrid script. Save it with a .bat extension and give it a run.

    @if (@CodeSection == @Batch) @then
    @echo off & setlocal
    
    set "url=http://www.domain.com/data/page/1"
    
    for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%url%"') do (
        rem // do something useful with %%I
        echo Link found: %%I
    )
    
    goto :EOF
    @end // end batch / begin JScript hybrid code
    
    // returns a DOM root object
    function fetch(url) {
        var XHR = WSH.CreateObject("Microsoft.XMLHTTP"),
            DOM = WSH.CreateObject('htmlfile');
    
        XHR.open("GET",url,true);
        XHR.setRequestHeader('User-Agent','XMLHTTP/1.0');
        XHR.send('');
        while (XHR.readyState!=4) {WSH.Sleep(25)};
        DOM.write('');
        DOM.write(XHR.responseText);
        return DOM;
    }
    
    var DOM = fetch(WSH.Arguments(0)),
        links = DOM.getElementsByTagName('a');
    
    for (var i in links)
        if (links[i].href && /\/post\/view\//i.test(links[i].href))
            WSH.Echo(links[i].href);
    

提交回复
热议问题