domdocument

PHP DOMDocument getElementsByTagname?

跟風遠走 提交于 2019-12-29 06:47:19
问题 This is driving me bonkers... I just want to add another img node. $xml = <<<XML <?xml version="1.0" encoding="UTF-8"?> <gallery> <album tnPath="tn/" lgPath="imm/" fsPath="iml/" > <img src="004.jpg" caption="4th caption" /> <img src="005.jpg" caption="5th caption" /> <img src="006.jpg" caption="6th caption" /> </album> </gallery> XML; $xmlDoc = new DOMDocument(); $xmlDoc->loadXML($xml); $album = $xmlDoc->getElementsByTagname('album')[0]; // Parse error: syntax error, unexpected '[' in

Is there a way to keep entities intact while parsing html with DomDocument?

旧时模样 提交于 2019-12-28 16:03:30
问题 I have this function to ensure every img tag has absolute URL: function absoluteSrc($html, $encoding = 'utf-8') { $dom = new DOMDocument(); // Workaround to use proper encoding $prehtml = "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset={$encoding}\"></head><body>"; $posthtml = "</body></html>"; if($dom->loadHTML( $prehtml . trim($html) . $posthtml)){ foreach($dom->getElementsByTagName('img') as $img){ if($img instanceof DOMElement){ $src = $img->getAttribute('src')

echo innerHTML, without outer node tags

我只是一个虾纸丫 提交于 2019-12-28 07:06:31
问题 I'm using the DOMDocument class to parse a fairly unpredictable string of markup. It's not all that well formed and I need some data from it. Regex's are right out, of course. So far, I've got this: $dom = new DOMDocument; $dom->loadHTML($str); $contents = $dom->getElementsByTagName('body')->item(0); echo $dom->saveXML($contents); Now this gives me: <body> <p>What I'm really after</p> <ul><li>Foo</li><li>Bar</li></ul> <h6>And so on</h6> </body> What really annoys me are those <body> tags. I

PHP parsing invalid html

浪尽此生 提交于 2019-12-28 06:46:11
问题 i'm trying to parse some html that is not on my server $dom = new DOMDocument(); $dom->loadHTMLfile("http://www.some-site.org/page.aspx"); echo $dom->getElementById('his_id')->item(0); but php returns an error something like ID his_id already defined in http://www.some-site.org/page.aspx, line: 33 . I think that is because DOMDocument is dealing with invalid html. So, how can i parse it even though is invalid? 回答1: You should run HTML Tidy on it to clean it up before parsing it. $html = file

problem with adding root path using php domdocument

跟風遠走 提交于 2019-12-28 04:38:11
问题 I would like to add root path of the site for those anchor tag which have not root path using php dom document, Till now a have made a function to do this with str_replace function but for some links its adding three and for times root path. Then what i should to edit in this function. Problem := The problem is its adding three and for times root path for every anchor tag, and not for some. $HTML variable has many anchor tags, about above 200 links. And also same for images. I know that its

Using PHP's DOMDocument::preserveWhiteSpace = false and still getting whitespace

本小妞迷上赌 提交于 2019-12-25 18:21:36
问题 I'm scraping this page: http://kat.ph/search/example/?field=seeders&sorder=desc In this way: ... curl_setopt( $curl, CURLOPT_URL, $url ); $header = array ( 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding:gzip,deflate,sdch', 'Accept-Language:en-US,en;q=0.8', 'Cache-Control:max-age=0', 'Connection:keep-alive', 'Host:kat.ph', 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19

How to append element to another element using php [duplicate]

偶尔善良 提交于 2019-12-25 13:35:47
问题 This question already has an answer here : DOMDocument append already fixed html from string (1 answer) Closed 6 years ago . I am trying to append the elements into my new created node in Domdocument. I have something like $dom = new DomDocument(); $dom->loadHTML($html]); $xpath=new DOMXpath($dom); $result = $xpath->query('//tbody'); if($result->length > 0){ $tbody = $dom->getElementsByTagName('tbody'); $table=$dom->createElement('table'); $table->appendChild($tbody); } My tbody doesn't have

Need help scraping webpage — getting specific content…

社会主义新天地 提交于 2019-12-25 08:49:33
问题 I have a table, of whose number of columns can change depending on the configuration of the scrapped page (I have no control of it). I want to get only the information from a specific column, designated by the columns heading. Here is a simplified table: <table> <tbody> <tr class='header'> <td>Image</td> <td>Name</td> <td>Time</td> </tr> <tr> <td><img src='someimage.png' /></td> <td>Name 1</td> <td>13:02</td> </tr> <tr> <td><img src='someimage.png' /></td> <td>Name 2</td> <td>13:43</td> </tr>

Parse boolean attributes with DOMDocument

淺唱寂寞╮ 提交于 2019-12-25 07:59:11
问题 I am trying to parse a simple config file with minimized Boolean attributes and the DOMDocument is not having it. I am trying to load the following: <config> <text id='name' required> <label>Name:</label> </text> </config> with the following code $dom = new DOMDocument(); $dom->preserveWhiteSpace=FALSE; if($dom->LoadXML($template) === FALSE){ throw new Exception("Could not parse template"); } I am getting a warning that Warning: DOMDocument::loadXML(): Specification mandate value for

How to know if a DOMDocument is owned by a parser

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-25 06:23:04
问题 I am looking into an issue where an API is getting called from two different sources. We have an API called dispatch. Its signature is as follows. DOMDocument* dispatch( DOMDocument * requestDocument ) We observed that this API can be called by passing a DOMDocument object that is A stand-alone DOMDocument object created using DOMImplementation::createDocument http://xerces.apache.org/xerces-c/apiDocs-3/classDOMImplementation.html A parse owned DOMDocument object created using