Are there any good php libraries that can convert html/php documents into objects

こ雲淡風輕ζ 提交于 2019-12-22 08:05:14

问题


I see lots of php libraries that can parse html. A nice example is QueryPath which mimics the Jquery Api.

However, I am looking to analyse phtml. So, not only would the library be good at analysing the DOM, but also be good at analysing the php processing instructions. e.g The Php Document Object Model or PDOM.

A document like this:

<?php
require 'NameFinder.php';
$title = 'Wave Hello';
$name = getName();
?><html>
<head>
<title><?php echo $title ?></title>
</head>
<body>
<h1>Hello <?php echo $name ?></h1>
<p>Blah Blah Blah</p>
</body>

I'd like to be able to use this kind of php library to read things like:

  • the inner html of a DOM node, found by xpath or css selector.

as well possibly offering things like:

  • a list of php functions/method invoked in the script
  • values of php variables
  • pages required by that page
  • a list of php variables used before line 5
  • a list of php variables used before the 1st para of the body element

I could spend some time peicing something together, borrowing code from things like phpdocumentor and Zend Framework Reflection, using the in-built DOM Api, introspection and string manipulation, etc.

But, if there is some kind of *phtmlQuery" library out there that can do these kinds of things then it will handy.


回答1:


To get the processing instructions (and other nodes) from your files, you can use DOM and XPath:

$dom = new DOMDocument;
$dom->loadHTMLFile('/path/to/your/file/or/url');
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//processing-instruction()') as $pi) {
    echo $dom->saveHTML($pi), PHP_EOL;
}

This will output:

<?php require 'NameFinder.php';
$title = 'Wave Hello';
$name = getName();
?>
<?php echo $title ?>
<?php echo $name ?>

This will work with broken HTML. You can find additional libraries at

  • How do you parse and process HTML/XML in PHP?

Once you got the processing instructions, you can either run them through the native Tokenizer or try some of these:

  • https://github.com/Andrewsville/PHP-Token-Reflection
  • https://github.com/manuelpichler/staticReflection
  • https://github.com/nikic/PHP-Parser

Those won't magically give you the information you seek out of the box, so you will likely need to write a few additional lines on your own.




回答2:


there is an xml parser included in php core that could do this, but you would only be able to use it on valid xhtml pages, and not just normal html or broken xhtml. you would have to set up the parser to handle the processing instructions and it could get very complicated.

http://www.php.net/manual/en/book.xml.php

http://www.php.net/manual/en/function.xml-set-processing-instruction-handler.php




回答3:


You could use PHP's token_get_all to tokenize the PHP so you could then walk the result and check for function calls and PHP values.

E.g.:

<?php

$src = <<<EOD
<?php
require 'NameFinder.php';
$title = 'Wave Hello';
$name = getName();
?><html>
<head>
<title><?php echo $title ?></title>
</head>
<body>
<h1>Hello <?php echo $name ?></h1>
<p>Blah Blah Blah</p>
</body>
EOD;

$tokens = token_get_all($src);

var_dump($tokens);

You still need to write a bit of code to walk over all the tokens, see what they are and then get the value based on the token type (function name, literal string, variable assignment etc), but this does a LOT of work for you as far as parsing the PHP.



来源:https://stackoverflow.com/questions/9152700/are-there-any-good-php-libraries-that-can-convert-html-php-documents-into-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!