问题
The problem here I am trying to solve is I have a bunch of documents which context mathematical expressions/formulas. I want to search the documents by the formula or expression.
So far based on my research I'm considering to convert the mathematical expression to latex format and store as a string in the database (elastic search).
With this approach will be I able to search for documents with the latex string?
Example latex conversion of a2 + b2 = c2 is a^{2} + b^{2} = c^{2} . Can this string be searchable in elastic search ?
回答1:
I agree with user @Lue E with some more modifications and tried with a simple keyword approach but gave me some issues, hence I modified my approach to using the keyword
tokenizer in my own custom analyzer which should solve most of your use-cases.
Index def with a custom analyzer
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "keyword", --> to make it searchable
"filter": [
"lowercase", --> case insensitive search
"trim" --> remove extra spaces
]
}
}
}
},
"mappings": {
"properties": {
"mathformula": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
Index sample docs
{
"mathformula" : "(a+b)^2 = a^2 + b^2 + 2ab"
}
{
"mathformula" : "a2+b2 = c2"
}
Search query(match query, uses the same analyzer of the index time)
{
"query": {
"match" : {
"mathformula" : {
"query" : "a2+b2 = c2"
}
}
}
}
The search result contains only first indexed doc
"hits": [
{
"_index": "so_math",
"_type": "_doc",
"_id": "1",
"_score": 0.6931471,
"_source": {
"mathformula": "a2+b2 = c2"
}
}
]
来源:https://stackoverflow.com/questions/60960265/what-is-the-best-way-to-index-documents-which-contain-mathematical-expression-in