What are practical and day-to-day usage examples of PHP Tokenizer ?
Has anyone used this?
I've used tokenizer to find the cyclomatic complexity number and some other code metrics of a callback:
if ((isset($reflection) === true) && ($reflection->getFileName() !== false))
{
if (($source = file($reflection->getFileName(), FILE_IGNORE_NEW_LINES)) !== false)
{
$source = implode("\n", array_slice($source, $reflection->getStartLine() - 1, $reflection->getEndLine() - ($reflection->getStartLine() - 1)));
$result[$key]['source'] = array
(
'ccn' => 1,
'statements' => 0,
'lines' => array
(
'logical' => array(),
'physical' => substr_count($source, "\n"),
),
);
if (is_array($tokens = token_get_all(sprintf('<?php %s ?>', $source))) === true)
{
$points = array_map('constant', array_filter(array
(
'T_BOOLEAN_AND',
'T_BOOLEAN_OR',
'T_CASE',
'T_CATCH',
'T_ELSEIF',
'T_FINALLY',
'T_FOR',
'T_FOREACH',
'T_GOTO',
'T_IF',
'T_LOGICAL_AND',
'T_LOGICAL_OR',
'T_LOGICAL_XOR',
'T_WHILE',
), 'defined'));
foreach ($tokens as $token)
{
if (is_array($token) === true)
{
if ((in_array($token[0], array(T_CLOSE_TAG, T_COMMENT, T_DOC_COMMENT, T_INLINE_HTML, T_OPEN_TAG), true) !== true) && (strlen(trim($token[1])) > 0))
{
if (in_array($token[0], $points, true) === true)
{
++$result[$key]['source']['ccn'];
}
array_push($result[$key]['source']['lines']['logical'], $token[2]);
}
}
else if (strncmp($token, '?', 1) === 0)
{
++$result[$key]['source']['ccn'];
}
else if (strncmp($token, ';', 1) === 0)
{
++$result[$key]['source']['statements'];
}
}
$result[$key]['source']['lines']['logical'] = max(0, count(array_unique($result[$key]['source']['lines']['logical'])) - 1);
}
}
}
I personally have already used it to build a PHP sandbox, which tries to create a more secure environment for executing PHP scripts.
Furthermore I did loads of experiments to preprocess PHP, e.g. I have an (incomplete) PHP 5.3 emulator for PHP 5.2 called prephp.
And many other similar tools, like source code analyzers (for security auditing, code style analysis, ...) use the Tokenizer as well.
But even for smaller things the Tokenizer may be handy. Not just large scale code analyzers. For example if you are accepting a PHP array and want to check that it's not malicious, you can do so using the Tokenizer.
PS: Currently I am switching to actually parsing the PHP, instead of just tokenizing it, using a PHP parser written in PHP I recently published (it works, but isn't really practically usable yet).