dataimporthandler

unsupported type Exception on importing documents from Database with Solr 4.0

ぐ巨炮叔叔 提交于 2019-12-11 05:41:35
问题 Looked up information provided on a related question to set up a import of all documents that are stored within a mysql database. you can find the original question here Thanks to steps provided I was able to make it work for me with mysql DB. My config looks identical to the one mentioned at above link. <dataConfig> <dataSource name="db" jndiName="java:jboss/datasources/somename" type="JdbcDataSource" convertType="false" /> <dataSource name="dastream" type="FieldStreamDataSource" />

Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

[亡魂溺海] 提交于 2019-12-08 01:01:33
问题 I'm trying to perform a full-import with document configuration similar to the following: <document> <entity name="parent" query="select * from parent_table" > <field name="id" column="ID" /> <entity name="child" query="select * from child_table where PARENT_ID = ${parent.ID}" transformer="ClobTransformer" > <field name="${child.FIELD_COLUMN}" column="VALUE_COLUMN" clob="true" /> </entity> </entity> </document> Let's say the field/value results from the child_table for parent.ID=1 look like

splitting multivalued field while importing data into solr

ε祈祈猫儿з 提交于 2019-12-07 19:49:23
问题 I'm having a bit of trouble getting my head around solr 3.4 when it comes to multiple values. I have this DIH: <dataConfig> <dataSource type="JdbcDataSource" name="********" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lokal" user="****" password="******" /> <document> <entity name="Search" transformer="RegexTransformer" query="select b_id, b_navn, b_cats, b_info, b_keyword, b_critical, b_geo, b_adress from searchbiz"> <field column="b_id" name="b_id" /> <field column="b_info"

Speed up solr indexing

这一生的挚爱 提交于 2019-12-07 17:48:59
问题 Solr indexing takes too long. I am using mysql with more than 30 million records. I am using two level sub queries. Please suggest me best practices for indexing data, so that i can speed up the process. 回答1: Check out SolrPerformanceFactors with Indexing_Performance and ImproveIndexingSpeed 来源: https://stackoverflow.com/questions/12328969/speed-up-solr-indexing

Data-config.xml and mysql - I can load only “id” column

£可爱£侵袭症+ 提交于 2019-12-07 07:38:23
问题 I've got Solr 5.0.0 on Windows Server 2012. I would like to load all data from my table into solr engine. My data-config.xml looks like this: <?xml version="1.0" encoding="UTF-8" ?> <!--# define data source --> <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/database" user="root" password="root"/> <document> <entity name="my_table" pk="id" query="SELECT ID, LASTNAME FROM my_table limit 2"> <field column="ID" name="id" type="string

How can I do indexing .html files in SOLR

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-07 07:37:04
问题 The files I want to do indexing is stored on the server(I don't need to crawl). /path/to/files/ the sample HTML file is <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="product_id" content="11"/> <meta name="assetid" content="10001"/> <meta name="title" content="title of the article"/> <meta name="type" content="0xyzb"/> <meta name="category" content="article category"/> <meta name="first" content="details of the article"/> <h4>title of the article</h4> <p class

Solr : data import handler and solr cell

柔情痞子 提交于 2019-12-06 15:58:50
问题 Is it possible to index rich document (pdf, office)... with data import handler using solr cell. I use solr 3.2. Thanks. 回答1: Solr Cell, aka ExtractingRequestHandler, uses Apache Tika behind the scenes, and the latter can easily be integrated into a DataImportHandler: <dataConfig> <!-- use any of type DataSource<InputStream> --> <dataSource type="BinURLDataSource"/> <document> <!-- The value of format can be text|xml|html|none. this is the format in which the body is emited (the 'text' field)

Solr DIH regextransformer - processes only one CSV line

主宰稳场 提交于 2019-12-06 14:17:48
问题 Hi I have the following CSV file 132 1536130302256087040 133 1536130302256087041 134 1536130302256087042 the fields are seperated by a tab. Now I have the Dataimporthandler (DIH) for the solr, and I try to import the CSV into solr, but I only get the first line into solr. Thats the result, but the other lines from the CSV are missing: "response": { "numFound": 1, "start": 0, "maxScore": 1, "docs": [ { "string": "1536130302256087040", "id": "132", "_version_": 1536202153221161000 } ] } Here is

Solr DataImportHandler configuration

≡放荡痞女 提交于 2019-12-06 13:57:15
I want to get data from mysql database with the help of DataImportHandler so i can create indexes. Now I've configured my Solr instance so that it works on Tomcat (the example admin page), but if I try to change the sorlconfig.xml file i'll get the error message. I'm working with Solr 3.6 So my configuration is: In solrconfig.xml I added: <dataDir>${solr.data.dir:/usr/share/tomcat7/solr2}</dataDir> to specify my working directory and then <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">/usr/share/tomcat7

Solr - DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

一个人想着一个人 提交于 2019-12-06 11:50:31
I'm trying to perform a full-import with document configuration similar to the following: <document> <entity name="parent" query="select * from parent_table" > <field name="id" column="ID" /> <entity name="child" query="select * from child_table where PARENT_ID = ${parent.ID}" transformer="ClobTransformer" > <field name="${child.FIELD_COLUMN}" column="VALUE_COLUMN" clob="true" /> </entity> </entity> </document> Let's say the field/value results from the child_table for parent.ID=1 look like this: FIELD_COLUMN VALUE_COLUMN fieldA value1 fieldB value2 fieldB value3 And the schema configuration