I need to scrape a length of text from a webpage from the internet, I am using the dom and xpath to find the data, however I cant seem to select the exact information I need. He
Your XPath is fine when I use it in Firefox, but it won't work with DOM, which is not surprising. I assume you got your XPath from some sort of browser plugin able to return the path for certain elements. However, you should not trust XPaths returned by browser plugins because browsers will modify the DOM through JavaScript and add implied values where necessary. Use the raw sourcecode instead.
Your XPath evaluates to "Home delivery within 2 days" in Firefox, which is not what I would expect in a variable called "stock_data". But anyway, this should do it:
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://www.argos.co.uk/static/Product/partNumber/9282197/Trail/searchtext%3EIPOD+TOUCH.htm');
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$nodes = $xpath->query(
'/html/body//div[@id="deliveryInformation"]/ul/li[@class="home"]/span'
);
echo $nodes->item(0)->nodeValue; // "Home delivery within 2 days"
Running your code, I first get :
Notice: Undefined variable: expr_argos
Warning: DOMXPath::query() [domxpath.query]: Invalid expression
So, first of all, make sure you are using something valid for your XPath query -- for example, you should have this :
$nodes_argos = $xpath_argos->query($expr_currys);
instead of what you currently have :
$nodes_argos = $xpath_argos->query($expr_argos);
Then, you get the following error :
Notice: Trying to get property of non-object
on the following line :
$argos_stock_data = $nodes_argos->item(0)->nodeValue;
Basically, this means you are trying to read a property, nodeValue
, on something that is not an object : $nodes_argos->item(0);
I'm guessing your XPath query is not valid ; so, the call to the xpath()
method doesn't return anything interesting.
You should check your (quite a bit too long to be easy to understand) XPath query, making sure it matches something in your HTML page.