domdocument | 易学教程

php spider breaks in middle (Domdocument, xpath, curl) - help needed

阅读更多关于 php spider breaks in middle (Domdocument, xpath, curl) - help needed

问题 I am a beginner programmer, designing a spider that crawls pages. Logic goes like this: get $url with curl create dom document parsing out href tags using xpath storing href attributes in $totalurls (that aren't already there) updating $url from $totalurls Problem is that after the 10th crawled page the spider says it does not find ANY links on the page, no no one on the next, and so on. But if I begin with the page that was 10th in previous example it finds all links with no problem but

how to print only one tag with curl

阅读更多关于 how to print only one tag with curl

问题 i have 2 or 3 tag <p> in my web but, im just want to print first and second <p> . how i can do that? here my code <?php $url = "http://www.web.org/dorama/1401143633/momikeshite-fuyu--wagaya-no-mondai-nakatta-koto-ni"; $ch = curl_init(); $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $html = curl_exec($ch); curl_close($ch); $dom = new DOMDocument(); @$dom->loadHTML($html); foreach($dom-

how to print only one tag with curl

阅读更多关于 how to print only one tag with curl

Fetching all images src from specific div

阅读更多关于 Fetching all images src from specific div

问题 Suppose, I have HTML structure like: <div> <div class="content"> <p>This is dummy text</p> <p><img src="a.jpg"></p> <p>This is dummy text</p> <p><img src="b.jpg"></p> </div> </div> I want to fetch all image src from .content div. I tried : <?php // a new dom object $dom = new domDocument; // load the html into the object $dom->loadHTML("example.com/article/2345"); // discard white space $dom->preserveWhiteSpace = false; //get element by class $finder = new DomXPath($dom); $classname =

DOMElement replace HTML value

阅读更多关于 DOMElement replace HTML value

问题 I have this HTML string in a DOMElement : <h1>Home</h1> test{{test}} I want to replace this content in a way that only <h1>Home</h1> test remains (so I want to remove the {{test}} ). At this moment, my code looks like this: $node->nodeValue = preg_replace( '/(?<replaceable>{{([a-z0-9_]+)}})/mi', '' , $node->nodeValue); This doesn't work because nodeValue doesn't contain the HTML value of the node. I can't figure out how to get the HTML string of the node other than using $node->C14N() , but

PHP Split html string into array

阅读更多关于 PHP Split html string into array

问题 I hope I can get some help from you guys. This is what I'm struggling with, I have a string of HTML that will look like this: <h4>Some title here</h4> <p>Lorem ipsum dolor</p> (some other HTML here) <h4>Some other title here</h4> <p>Lorem ipsum dolor</p> (some other HTML here) I need to split all the <h4> from the rest of the content, but for example the content after the first <h4> and before the second <h4> needs to be related to the first <h4> , something like this: Array { [0] => <h4>Some

PHP DOMDocument how to get that content of this tag?

阅读更多关于 PHP DOMDocument how to get that content of this tag?

问题 I am using domDocument hoping to parse this little html code. I am looking for a specific span tag with a specific id . <span id="CPHCenter_lblOperandName">Hello world</span> My code: $dom = new domDocument; @$dom->loadHTML($html); // the @ is to silence errors and misconfigures of HTML $dom->preserveWhiteSpace = false; $nodes = $dom->getElementsByTagName('//span[@id="CPHCenter_lblOperandName"'); foreach($nodes as $node){ echo $node->nodeValue; } But For some reason I think something is wrong

HTML DOM Document parsing

阅读更多关于 HTML DOM Document parsing

问题 i am new to DOM Document.. i have this html: <tr class="calendar_row" data-eventid="39657"> <td class="alt1 eventDate smallfont" align="center">Sun<div class="eventday_multiple">Dec 9</div></td> <td class="alt1 smallfont" align="center">3:34am</td> <td class="alt1 smallfont" align="center">USD</td> </tr> <tr class="calendar_row" data-eventid="39658"> <td class="alt1 eventDate smallfont" align="center">Sun<div class="eventday_multiple">Dec 10</div></td> <td class="alt1 smallfont" align="center

WebKit2 and DomDocument/JavaScriptCore (Python3)

阅读更多关于 WebKit2 and DomDocument/JavaScriptCore (Python3)

问题 I am converting a Python3 application to use WebKit2 instead of WebKit (which is no longer available in Debian Buster). In the application the user can (de)select check boxes which I read from the Python3 application. In the original code I could simply get the DomDocument of the Webview and iterate through the child objects to return the value of the object with a given name (sample code below). In WebKit2 the get_dom_document function is no longer available and the WebKit2 documentation is

get a complete table with php domdocument and print it

阅读更多关于 get a complete table with php domdocument and print it

问题 I would like to get a complete html table having id = 'myid' from a given url using php domddocument and print it to our web page, How can i do this ? I am trying with below code to get table but i cant getting trs(table rows) and tds(table data) and other inner html. $xml = new DOMDocument(); @$xml->loadHTMLFile($url); foreach($xml->getElementById('myid') as $table) { // now how to get tr and td and other element ? // i am getting other element like :- $links = $table->getElementsByTagName(