dataimporthandler

Solr Indexing My SQL Timestamp or Date Time field

荒凉一梦 提交于 2019-12-06 11:49:39
To index Date in Solr, Date should be in ISO format. Can we index MySQL Timestamp or Date Time feild with out modifying SQL Select Statement ? I have used <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/> <field name="CreatedDate" type="tdate" indexed="true" stored="true" /> CreatedDate is of Type Date Time in MySQL I am getting following exception 11:23:39,117 WARN [org.apache.solr.handler.dataimport.DateFormatTransformer] (Thread- 72) Could not parse a Date field : java.text.ParseException: Unparseable date: "2013-04-14 11:22:48

How to use the Solr Data Import Handler to index a MySQL table?

南笙酒味 提交于 2019-12-06 10:38:50
问题 When I try to import a mysql table by loading this in the browser: http://192.168.136.129:8983/solr/dataimport?command=full-import I get this error: HTTP ERROR 404 Problem accessing /solr/dataimport. Reason: NOT_FOUND Powered by Jetty:// I'm following this tutorial from the official Solr wiki to get started with the DIH: http://wiki.apache.org/solr/DIHQuickStart As per the tutorial I added this to my solrconfig.xml: <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport

splitting multivalued field while importing data into solr

▼魔方 西西 提交于 2019-12-06 09:14:11
I'm having a bit of trouble getting my head around solr 3.4 when it comes to multiple values. I have this DIH: <dataConfig> <dataSource type="JdbcDataSource" name="********" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lokal" user="****" password="******" /> <document> <entity name="Search" transformer="RegexTransformer" query="select b_id, b_navn, b_cats, b_info, b_keyword, b_critical, b_geo, b_adress from searchbiz"> <field column="b_id" name="b_id" /> <field column="b_info" name="b_info" /> <field column="b_cats" name="b_cats" splitBy=","/> </entity> </document> </dataConfig>

Does Solr data import handler support custom variables?

做~自己de王妃 提交于 2019-12-06 04:43:44
I currently have an issue with my data import handler where ${dataimporter.last_index_time} is not granular enough to capture two events that happen within a second of each other, leading to issues where a record is skipped over in my database. I am thinking to replace last_index_time with a simple atomically incrementing value as opposed to a datetime, but in order to do that I need to be able to set and read custom variables through solr that can be referenced in my data-config.xml file. Alternatively , if I could find some way to set dataimporter.last_index_time , that would work just as

DataImportHandler is not indexing mysql table in solr admin

喜欢而已 提交于 2019-12-06 03:29:06
I am trying to index the mysql table in solr using DataImportHandler, but it's seems not indexing data-config.xml <?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/solr_tut" user="root" password=""/> <document> <entity name="product_id" query="select product_id,name,description from products"> </entity> </document> </dataConfig> solrconfig.xml <lib dir="../../../contrib/dataimporthandler/lib/" regex=".*\.jar" /> <lib dir="../../../dist/" regex="solr-dataimporthandler-\d.*\.jar" /> <requestHandler

Speed up solr indexing

吃可爱长大的小学妹 提交于 2019-12-05 18:22:56
Solr indexing takes too long. I am using mysql with more than 30 million records. I am using two level sub queries. Please suggest me best practices for indexing data, so that i can speed up the process. Check out SolrPerformanceFactors with Indexing_Performance and ImproveIndexingSpeed 来源: https://stackoverflow.com/questions/12328969/speed-up-solr-indexing

Data-config.xml and mysql - I can load only “id” column

大憨熊 提交于 2019-12-05 10:05:20
I've got Solr 5.0.0 on Windows Server 2012. I would like to load all data from my table into solr engine. My data-config.xml looks like this: <?xml version="1.0" encoding="UTF-8" ?> <!--# define data source --> <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/database" user="root" password="root"/> <document> <entity name="my_table" pk="id" query="SELECT ID, LASTNAME FROM my_table limit 2"> <field column="ID" name="id" type="string" indexed="true" stored="true" required="true" /> <field column="LASTNAME" name="lastname" type="string

Near duplicate detection in Solr

若如初见. 提交于 2019-12-05 09:28:14
问题 Solr is being used to search through a database of user-generated listings. These listings are imported into Solr from MySQL via the DataImportHandler. Problem: Quite often, users report the same listing to the database, sometimes with minor changes to their listing post to avoid being easily detected as a duplicate post. How should I implement a near-duplication detection with Solr? I do not mind having near-duplicate listings in the Solr index as long as the search results do not contain

Solr : data import handler and solr cell

北战南征 提交于 2019-12-04 22:55:52
Is it possible to index rich document (pdf, office)... with data import handler using solr cell. I use solr 3.2. Thanks. Solr Cell, aka ExtractingRequestHandler , uses Apache Tika behind the scenes, and the latter can easily be integrated into a DataImportHandler: <dataConfig> <!-- use any of type DataSource<InputStream> --> <dataSource type="BinURLDataSource"/> <document> <!-- The value of format can be text|xml|html|none. this is the format in which the body is emited (the 'text' field) . The implicit field 'text' will have that format. default value is 'text' (if not specified) . format=

Solr safe dataimport and core swap on high-traffic website

你说的曾经没有我的故事 提交于 2019-12-04 18:54:28
问题 Hello fellow technicians, Let's assume we have a (PHP) website with millions of visitors a month and we running a SolR index on the website with 4 million documents hosted. Solr is running on 4 separate servers where one server is the master and other 3 servers are replicated. There can be inserted thousands of documents into Solr every 5 minutes. And besides that, user can update their account which also should trigger a solr update. I am looking for a safe strategy to rebuild the index fast