1a. script that scans facebook...
How do you plan on defining 'obnoxious'. that sounds pretty difficult.
2a. similarity. syntax and semantics, and other 'linguisticy' stuff sounds difficult. people have done this sort of thing with other more numerical methods with great success though, for example using the singular value decomposition. i think that this method has also been incorporated into software used to check for plagiarism. this method is also often referred to as latent semantic analysis or latent semantic mapping.
svdlibc:
http://tedlab.mit.edu/~dr/svdlibc/
1b. fsm stuff. im not sure what you mean by 'proving that a transducer is minimal'. this is a pretty standard operation and is included in pretty much any toolkit you might encounter. if you are interested in fsms, take a look at the
AT&T toolkit:
http://www2.research.att.com/~fsmtools/fsm/
or
OpenFST toolkit:
http://www.openfst.org/
fsms are growing in popularity as a principled, unified method for doing speech recognition. my graduate work focuses on this subject, and it is indeed very interesting.
what about building an hmm-based parser or chunker, or a simple viterbi decoder? if you put together a decent training set (you'd have to tag it yourself to begin with) you could approximate a simple version of your 'obnoxious comments' tagger and use that, maybe with some sort of classifier to 'censor' or remove the obnoxious comments.