问题
I created a custom tokenizer, it seem work fine by checking with admin/analysis.jsp and with system.out log. However when I perform querying in the field which use this custom tokenizer, I saw that custom tokenizer solr only is invoked for the first query string (check by system.out log). Could you help me by point out what I am wrong ?. These are my code:
package com.fosp.searchengine;
import java.io.Reader;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.solr.analysis.WhitespaceTokenizerFactory;
public class JvnTextProTokenizerFactory extends WhitespaceTokenizerFactory{
@Override
public WhitespaceTokenizer create(Reader input) {
System.out.println("WhitespaceTokenizer create(Reader input)");
Reader processedStringReader = new ProcessedStringReader(input);
return new WhitespaceTokenizer(processedStringReader);
}
}
package com.fosp.searchengine;
import java.io.IOException;
import java.io.Reader;
public class ProcessedStringReader extends java.io.Reader {
private static final int BUFFER_SIZE = 1024 * 8;
private static TextProcess m_textProcess = null;
private char[] m_inputData = null;
private int m_offset = 0;
private int m_length = 0;
public ProcessedStringReader(Reader input){
char[] arr = new char[BUFFER_SIZE];
StringBuffer buf = new StringBuffer();
int numChars;
try {
while ((numChars = input.read(arr, 0, arr.length)) > 0) {
buf.append(arr, 0, numChars);
}
} catch (IOException e) {
e.printStackTrace();
}
if(m_textProcess == null){
try {
m_textProcess = new TextProcess();
} catch (IOException e) {
e.printStackTrace();
}
}
m_inputData = m_textProcess.processText(buf.toString()).toCharArray();
m_offset = 0;
m_length = m_inputData.length;
}
@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int charNumber = 0;
for(int i = m_offset + off;i<m_length && charNumber< len; i++){
cbuf[charNumber] = m_inputData[i];
m_offset ++;
charNumber++;
}
if(charNumber == 0){
return -1;
}
return charNumber;
}
@Override
public void close() throws IOException {
m_inputData = null;
m_offset = 0;
m_length = 0;
}
}
Schema.xml
<fieldType name="text_jvnTextPro" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="com.fosp.searchengine.JvnTextProTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="com.fosp.searchengine.JvnTextProTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
回答1:
There nothing wrong here. Factory instantiated class is re-used. This is different in analysis/admin page. The difference is that.
来源:https://stackoverflow.com/questions/10185076/custom-tokenizer-solr-only-is-invoked-at-the-first