Indexing Multiple documents and mapping to unique solr id

前端未结

关注

 1  723

My use case is to index 2 files: metadata file and a binary PDF file to a unique solr id. Metadata file has content in form of XML file and some schema fields are mapped to

相关标签:

1条回答

时光取名叫无心

2021-01-16 08:36
Given a file record1234.pdf and metadata like:
```
<metadata>
<field1>value1</field1>
<field2>value2</field2>
<field3>value3</field3>
</metadata>
```
Do the programmatic equivalent of
```
curl "http://localhost:8983/solr/update/extract?
literal.id=record1234.pdf
&literal.field1=value1
&literal.field2=value2
&literal.field3=value3
&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_txt&boost.foo_txt=3&"  -F "tutorial=@tutorial.pdf"
```
Adapted from http://wiki.apache.org/solr/ExtractingRequestHandler#Literals .

This will create a new entry in the index containing the text output from Tika/Solr CEL as well as the fields you specify.

You should be able to perform these operations in your favorite language.

the content in metadata file is not mapped to field names

If they dont map to a predefined field, then use dynamic fields. For example you can set a *_i to be an integer field.

I want to avoid creating 1 file(by merging PDF text + metadata file).

That looks like programmer fatigue :-) But, do you have a good reason?
0 讨论(0)
发布评论:

提交评论
- 加载中...