Generating a custom Tokenizer for new TokenStream API using JFlex/ Java CC

廉价感情. 提交于 2019-11-26 18:30:29

问题


We are currently using Lucene 2.3.2 and want to migrate to 3.4.0 . We have our own custom Tokenizer generated using Java CC which has been in use ever since we started using Lucene and we want to continue with the same behavior. I appreciate pointers to any resources that deal with building a Tokenizer for new TokenStream API from grammar.

UPDATE:

I found the grammar used to generate StandardTokenizer at http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex?view=log&pathrev=692211. Modified grammar to suit to our requirements and generated java code using jflex http://jflex.de/


回答1:


I found the grammar used to generate StandardTokenizer at http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex?view=log&pathrev=692211. Modified grammar to suit to our requirements and generated java code using jflex http://jflex.de/



来源:https://stackoverflow.com/questions/7846305/generating-a-custom-tokenizer-for-new-tokenstream-api-using-jflex-java-cc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!