dataimporthandler | 易学教程

unsupported type Exception on importing documents from Database with Solr 4.0

阅读更多关于 unsupported type Exception on importing documents from Database with Solr 4.0

问题 Looked up information provided on a related question to set up a import of all documents that are stored within a mysql database. you can find the original question here Thanks to steps provided I was able to make it work for me with mysql DB. My config looks identical to the one mentioned at above link. <dataConfig> <dataSource name="db" jndiName="java:jboss/datasources/somename" type="JdbcDataSource" convertType="false" /> <dataSource name="dastream" type="FieldStreamDataSource" />

Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

阅读更多关于 Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

问题 I'm trying to perform a full-import with document configuration similar to the following: <document> <entity name="parent" query="select * from parent_table" > <field name="id" column="ID" /> <entity name="child" query="select * from child_table where PARENT_ID = ${parent.ID}" transformer="ClobTransformer" > <field name="${child.FIELD_COLUMN}" column="VALUE_COLUMN" clob="true" /> </entity> </entity> </document> Let's say the field/value results from the child_table for parent.ID=1 look like

splitting multivalued field while importing data into solr

阅读更多关于 splitting multivalued field while importing data into solr

问题 I'm having a bit of trouble getting my head around solr 3.4 when it comes to multiple values. I have this DIH: <dataConfig> <dataSource type="JdbcDataSource" name="********" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lokal" user="****" password="******" /> <document> <entity name="Search" transformer="RegexTransformer" query="select b_id, b_navn, b_cats, b_info, b_keyword, b_critical, b_geo, b_adress from searchbiz"> <field column="b_id" name="b_id" /> <field column="b_info"

Speed up solr indexing

阅读更多关于 Speed up solr indexing

问题 Solr indexing takes too long. I am using mysql with more than 30 million records. I am using two level sub queries. Please suggest me best practices for indexing data, so that i can speed up the process. 回答1: Check out SolrPerformanceFactors with Indexing_Performance and ImproveIndexingSpeed 来源： https://stackoverflow.com/questions/12328969/speed-up-solr-indexing

Data-config.xml and mysql - I can load only “id” column

阅读更多关于 Data-config.xml and mysql - I can load only “id” column

问题 I've got Solr 5.0.0 on Windows Server 2012. I would like to load all data from my table into solr engine. My data-config.xml looks like this: <?xml version="1.0" encoding="UTF-8" ?>  <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/database" user="root" password="root"/> <document> <entity name="my_table" pk="id" query="SELECT ID, LASTNAME FROM my_table limit 2"> <field column="ID" name="id" type="string

How can I do indexing .html files in SOLR

阅读更多关于 How can I do indexing .html files in SOLR

问题 The files I want to do indexing is stored on the server(I don't need to crawl). /path/to/files/ the sample HTML file is <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="product_id" content="11"/> <meta name="assetid" content="10001"/> <meta name="title" content="title of the article"/> <meta name="type" content="0xyzb"/> <meta name="category" content="article category"/> <meta name="first" content="details of the article"/> <h4>title of the article</h4> <p class

Solr : data import handler and solr cell

阅读更多关于 Solr : data import handler and solr cell

问题 Is it possible to index rich document (pdf, office)... with data import handler using solr cell. I use solr 3.2. Thanks. 回答1: Solr Cell, aka ExtractingRequestHandler, uses Apache Tika behind the scenes, and the latter can easily be integrated into a DataImportHandler: <dataConfig>  <dataSource type="BinURLDataSource"/> <document> <!-- The value of format can be text|xml|html|none. this is the format in which the body is emited (the 'text' field)

Solr DIH regextransformer - processes only one CSV line

阅读更多关于 Solr DIH regextransformer - processes only one CSV line

问题 Hi I have the following CSV file 132 1536130302256087040 133 1536130302256087041 134 1536130302256087042 the fields are seperated by a tab. Now I have the Dataimporthandler (DIH) for the solr, and I try to import the CSV into solr, but I only get the first line into solr. Thats the result, but the other lines from the CSV are missing: "response": { "numFound": 1, "start": 0, "maxScore": 1, "docs": [ { "string": "1536130302256087040", "id": "132", "_version_": 1536202153221161000 } ] } Here is

Solr DataImportHandler configuration

阅读更多关于 Solr DataImportHandler configuration

I want to get data from mysql database with the help of DataImportHandler so i can create indexes. Now I've configured my Solr instance so that it works on Tomcat (the example admin page), but if I try to change the sorlconfig.xml file i'll get the error message. I'm working with Solr 3.6 So my configuration is: In solrconfig.xml I added: <dataDir>${solr.data.dir:/usr/share/tomcat7/solr2}</dataDir> to specify my working directory and then <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">/usr/share/tomcat7

Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

阅读更多关于 Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

I'm trying to perform a full-import with document configuration similar to the following: <document> <entity name="parent" query="select * from parent_table" > <field name="id" column="ID" /> <entity name="child" query="select * from child_table where PARENT_ID = ${parent.ID}" transformer="ClobTransformer" > <field name="${child.FIELD_COLUMN}" column="VALUE_COLUMN" clob="true" /> </entity> </entity> </document> Let's say the field/value results from the child_table for parent.ID=1 look like this: FIELD_COLUMN VALUE_COLUMN fieldA value1 fieldB value2 fieldB value3 And the schema configuration