dataimporthandler

solr dataimport error: Indexing failed. Rolled back all changes

北城余情 提交于 2019-12-04 05:30:33
问题 When I run the "Full import with cleaning" command, error is "Indexing failed. Rolled back all changes" My dataimport config file: <dataConfig> <dataSource type="JdbcDataSource" name="ds-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://my.ip/my_db" user="my_db_user" password="my_password" readOnly="True"/> <document> <entity name="videos" pk="ID" transformer="TemplateTransformer" dataSource="ds-1" query="SELECT * FROM videos LIMIT 100"> <field column="id" name="unid" indexed="true" stored=

Near duplicate detection in Solr

旧街凉风 提交于 2019-12-03 21:36:53
Solr is being used to search through a database of user-generated listings. These listings are imported into Solr from MySQL via the DataImportHandler. Problem: Quite often, users report the same listing to the database, sometimes with minor changes to their listing post to avoid being easily detected as a duplicate post. How should I implement a near-duplication detection with Solr? I do not mind having near-duplicate listings in the Solr index as long as the search results do not contain these near-duplicate listings. I guess there are 4 possible places to do this near-duplicate detection

Solr: DIH for multilingual index & multiValued field?

不羁的心 提交于 2019-12-03 20:57:44
I have a MySQL table: CREATE TABLE documents ( id INT NOT NULL AUTO_INCREMENT, language_code CHAR(2), tags CHAR(30), text TEXT, PRIMARY KEY (id) ); I have 2 questions about Solr DIH: 1) The langauge_code field indicates what language the text field is in. And depending on the language, I want to index text to different Solr fields. # pseudo code if langauge_code == "en": index "text" to Solr field "text_en" elif langauge_code == "fr": index "text" to Solr field "text_fr" elif langauge_code == "zh": index "text" to Solr field "text_zh" ... Can DIH handle a usecase like this? How do I configure

Solr safe dataimport and core swap on high-traffic website

本秂侑毒 提交于 2019-12-03 12:21:31
Hello fellow technicians, Let's assume we have a (PHP) website with millions of visitors a month and we running a SolR index on the website with 4 million documents hosted. Solr is running on 4 separate servers where one server is the master and other 3 servers are replicated. There can be inserted thousands of documents into Solr every 5 minutes. And besides that, user can update their account which also should trigger a solr update. I am looking for a safe strategy to rebuild the index fast and safe without missing any document. And to have a safe delta/update strategy. I have thought about

How can I do indexing XML files stored on other server in solr4

痞子三分冷 提交于 2019-12-02 06:40:08
问题 I have all my XML files stored on to the other server and I have installed and configure the SOLR on different server. How can I index those XML files into the SOLR. I have checked nutch but it's main purpose is to crawl the html pages and index them. I don't need to crawl. I have All those files on specific path on other server. I just need to do indexing those XML files in SOLR. I have installed and configure SOLR4. If anyone have did some thing like this please let me know how to do that.

Solr DataImportHandler CachedSqlEntityProcessor ClassCastException

风流意气都作罢 提交于 2019-12-02 04:22:23
问题 I am using Solr 4.6.0 and trying to import my data using CachedSqlEntityProcessor , but somehow I end up getting a ClassCastException . Schema <fields> <field name="_version_" type="long" indexed="true" stored="true"/> <field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" /> <field name="conference" type="string" indexed="true" stored="true" /> <field name="year" type="int" indexed="true" stored="true" /> <field name="doi" type="string" indexed="false"

solr dataimport error: Indexing failed. Rolled back all changes

旧巷老猫 提交于 2019-12-02 04:11:59
When I run the "Full import with cleaning" command, error is "Indexing failed. Rolled back all changes" My dataimport config file: <dataConfig> <dataSource type="JdbcDataSource" name="ds-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://my.ip/my_db" user="my_db_user" password="my_password" readOnly="True"/> <document> <entity name="videos" pk="ID" transformer="TemplateTransformer" dataSource="ds-1" query="SELECT * FROM videos LIMIT 100"> <field column="id" name="unid" indexed="true" stored="true" /> <field column="title" name="baslik" indexed="true" stored="true" /> <field column="video_img"

Solr DataImportHandler CachedSqlEntityProcessor ClassCastException

好久不见. 提交于 2019-12-02 02:16:44
I am using Solr 4.6.0 and trying to import my data using CachedSqlEntityProcessor , but somehow I end up getting a ClassCastException . Schema <fields> <field name="_version_" type="long" indexed="true" stored="true"/> <field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" /> <field name="conference" type="string" indexed="true" stored="true" /> <field name="year" type="int" indexed="true" stored="true" /> <field name="doi" type="string" indexed="false" stored="true" /> <field name="text" type="text_en_shingling" indexed="true" stored="true" /> </fields>

Solr Facet Multiple Words with Comma Separated Values

只愿长相守 提交于 2019-12-01 11:01:55
I'm pulling data into solr from mysql. One of the fields is generated using a group_concat function that results in a comma separated field that lists all the bands for an event. At the time I believe this was the best way to store multiple bands for one event. However, I'm finding that I cannot facet this query against all events. I've set the band field to string and multivalued to true. <field name="bands" type="string" indexed="true" stored="true" multiValued="true"/> The result is as expected where the string is faceted as one long string. "Pearl Jam,Alice,Screaming Trees,Everclear",1,

Solr Facet Multiple Words with Comma Separated Values

烈酒焚心 提交于 2019-12-01 07:50:15
问题 I'm pulling data into solr from mysql. One of the fields is generated using a group_concat function that results in a comma separated field that lists all the bands for an event. At the time I believe this was the best way to store multiple bands for one event. However, I'm finding that I cannot facet this query against all events. I've set the band field to string and multivalued to true. <field name="bands" type="string" indexed="true" stored="true" multiValued="true"/> The result is as