I\'m looking for the best PHP-based way to scan a lot of text entries (classifieds) and pull out keywords - anyone know about Part-of-Speech tagging? Is there a PHP-ish way
Ian Barber has implemented a Brill Tagger in PHP, which he presents on his PHP/ir site where he describes using it to analyse tweets.
Yea i'm currently using the Brill tagger. It works to some extent, although I wish I could figure out how to contribute to its ruleset. It makes plenty of mistakes, but still provides about 85% accurate data. My only issue is that it is SLOW!
It gets it right where it counts, on words with double meaning - however, there are many conventions unaccounted for, such as contrasting conjunction clauses, for instance I might say something negative about somebody, but after the comma, say something that reverse the polarity to positive, or not. The computer can't see idioms.