What are practical and day-to-day usage examples of PHP Tokenizer ?
Has anyone used this?
Interesting question.
I have not used the tokenizer in any production projects myself yet, but there are several questions on Stack Overflow to which the tokenizer is the (or at least, one) correct answer.
Automatically parsing PHP to separate PHP code from HTML - extracting comments out of PHP code, e.g. to build documentation (phpDocumentor works this way)
Class exists in an external file - analyzing code, seeing whether a class exists inside a file (e.g. for a plugin management system)
Permanently write variables to a php file with php - altering PHP source code files, e.g. to fill in configuration variables. Using the tokenizer woud be the first step to do this on a parser level.
How to create a list of all built-in PHP functions a project makes use of? - analyzing what functions are used in a PHP project
I use PHP_CodeSniffer for coding style compliance, which is built on the tokeniser. Also, some frameworks (e.g. Symfony 2) use the tokeniser to generate cache files or intermediate class files of PHP code. It's also possible to use the tokeniser to build a source code formatter or syntax highlighter.
Basically, anywhere you use PHP code as data you can use the tokeniser. It's much more reliable that trying to parse PHP code with regular expressions or other string processing functions.
A friend of mine has written Überloader (A brute-force autoloader for PHP5.) which uses this very technique when it indexes class files. The _check_file() method from it will be of particular interest to you.
Überloader is designed for legacy projects that have not planned or thought about their class naming conventions or file structures.
I use the class everyday in legacy projects that I am fixing up or renovating.
From a comment in the PHP manual:
The tokenizer functions are quite powerful. For example, you can retrieve all of the methods in a given class using an algorithm like:
for each token: if token is T_FUNCTION then start buffer if buffer is started then add the current string to the buffer if token is ( stop buffer
And the great thing is that the class methods will have the right case, so it's a good way to get around the limitations with get_class_methods returning lowercase method names. Also since using a similar algorithm you can read the arguments of a function you can implement Reflections-like functionality into PHP4.
Finally you can use it as a simpler method of extracting Javadoc out of a class file to generate documentation. The util/MethodTable.php class in AMFPHP (http://www.amfphp.org) uses the tokenizer functions to create a method table with all of the arguments, a description, return type, etc. and from that method table it can generate ActionScript that matches the PHP, but it could also be fitted to generate JavaScript, documentation files, or basically anything you put your mind to. I can also see that this could be the base for a class -> WSDL file generator.
You can use for gathering various informations about some php code, as for example all defined classes, methods, variables, generating documentation and similar tasks.
A pretty basic use is for syntax highlighting.
foreach(token_get_all($source) as $token) {
if (is_array($token))
{
$map = "token_name";
echo "<span class={$map($token[0])}>$token[1]</span>";
}
else {
echo "<span class=T_RAW>$token</span>";
}
}
The token numbers are usually converted into nicer CSS class names of course, but you could just craft a stylesheet with only .T_COMMENT, .T_ARRAY, .T_ELSEIF, .T_FUNCTION ... classes.
I'm working on a Symfony 1.2 legacy application and I use the tokenizer to get all calls of sfConfig::get()
and sfConfig::set()
.
So basically I document all configuration parameters of my application.