Is it possible to parse data from web html page using windows batch?
let\'s say I have a web page: www.domain.com/data/page/1 Page source html:
...
&
It's better to parse structured markup as a hierarchical object, rather than scraping as flat text. That way you aren't so dependent upon the formatting of the data you're parsing (whether it's minified, spacing has changed, whatever).
The batch language isn't terribly well-suited to parse markup language like HTML, XML, JSON, etc. In such cases, it can be extremely helpful to use a hybrid script and borrow from JScript or PowerShell methods to scrape the data you need. Here's an example demonstrating a batch + JScript hybrid script. Save it with a .bat extension and give it a run.
@if (@CodeSection == @Batch) @then
@echo off & setlocal
set "url=http://www.domain.com/data/page/1"
for /f "delims=" %%I in ('cscript /nologo /e:JScript "%~f0" "%url%"') do (
rem // do something useful with %%I
echo Link found: %%I
)
goto :EOF
@end // end batch / begin JScript hybrid code
// returns a DOM root object
function fetch(url) {
var XHR = WSH.CreateObject("Microsoft.XMLHTTP"),
DOM = WSH.CreateObject('htmlfile');
XHR.open("GET",url,true);
XHR.setRequestHeader('User-Agent','XMLHTTP/1.0');
XHR.send('');
while (XHR.readyState!=4) {WSH.Sleep(25)};
DOM.write('');
DOM.write(XHR.responseText);
return DOM;
}
var DOM = fetch(WSH.Arguments(0)),
links = DOM.getElementsByTagName('a');
for (var i in links)
if (links[i].href && /\/post\/view\//i.test(links[i].href))
WSH.Echo(links[i].href);