Indexing Multiple documents and mapping to unique solr id

前端 未结 1 722
误落风尘
误落风尘 2021-01-16 07:46

My use case is to index 2 files: metadata file and a binary PDF file to a unique solr id. Metadata file has content in form of XML file and some schema fields are mapped to

相关标签:
1条回答
  • 2021-01-16 08:36

    Given a file record1234.pdf and metadata like:

    <metadata>
    <field1>value1</field1>
    <field2>value2</field2>
    <field3>value3</field3>
    </metadata>
    

    Do the programmatic equivalent of

    curl "http://localhost:8983/solr/update/extract?
    literal.id=record1234.pdf
    &literal.field1=value1
    &literal.field2=value2
    &literal.field3=value3
    &captureAttr=true&defaultField=text&capture=div&fmap.div=foo_txt&boost.foo_txt=3&"  -F "tutorial=@tutorial.pdf"
    

    Adapted from http://wiki.apache.org/solr/ExtractingRequestHandler#Literals .

    This will create a new entry in the index containing the text output from Tika/Solr CEL as well as the fields you specify.

    You should be able to perform these operations in your favorite language.


    the content in metadata file is not mapped to field names

    If they dont map to a predefined field, then use dynamic fields. For example you can set a *_i to be an integer field.

    I want to avoid creating 1 file(by merging PDF text + metadata file).

    That looks like programmer fatigue :-) But, do you have a good reason?

    0 讨论(0)
提交回复
热议问题