Solr delta-import not working

断了今生、忘了曾经 提交于 2019-11-29 12:37:05

Things might worth looking for:

1.Timestamp saved in dataimport.properties config file

This happens to me before

Running delta-import (successfully) will update the {dataimporter.last_index_time} in conf/dataimport.properties file. And next time, your query may run based on the new timestamp, which may return zero rows unless you updated the database.

2. dataimporter.delta.id and dataimporter.last_index_time

dataimporter.delta.id should be dih.delta.id

last_index_time remains in the dataimporter namespace. **dataimporter.last_index_time** works at least in solr 4.2.0. dih.last_index_time might works too as it was mentioned in the solr wiki, but I haven't test it

3. Timestamp need to be converted to proper DataTime datatype depends on the DB .

In case of SQL server:

LAST_MODIFIED_DATETIME > convert(datetime,'${dataimporter.last_index_time}')

There are some bugs in certain versions with last_index_time. You haven't indicated which Solr version you're on, but most people these days are on 4.x.

Also, there are some bugs where the old dataimporter property namespace doesn't work. With 4.x you should be using the dih property namespace, which means dih.last_index_time and dih.delta.id instead of dataimporter.* for the property names.

I'm running SOLR in Tomcat 7 on windows. Tracing the ODBC connection i see that the language is set to Norwegian. (Norwegian = Norsk) in Norwegian ;)

set arithabort off
set numeric_roundabort off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set cursor_close_on_commit off
set implicit_transactions off
set language Norsk
set dateformat dmy
set datefirst 1
set transaction isolation level read committed

JVM is started with these args

-Duser.region=US
-Duser.language=en
-Duser.timezone=Europe/Oslo

It didn't make any difference whether is was set Norwegian or English

Adding a propertyWriter tag to the configuration file fixed the problem.

<dataConfig>
<propertyWriter dateFormat="yyyy-dd-MM HH:mm:ss" type="SimplePropertiesWriter" directory="D:/tmp" filename="knowledgebase.dih.properties" locale="English (United States)" />
<dataSource name="db" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost:1433;databaseName=norway_operations;responseBuffering=adaptive;selectMethod=cursor" user="noropuser" password="noropuser" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED" holdability="CLOSE_CURSORS_AT_COMMIT" />

<document>

    <entity type="a" name="knowledge" dataSource="db" pk="BASE_ID" query="select * from vKNOWLEDGE_BASE"
            deltaQuery="select BASE_ID from vKNOWLEDGE_BASE where '${dataimporter.last_index_time}' &lt; TIMESTAMP" 
            deltaImportQuery="select * from vKNOWLEDGE_BASE where BASE_ID = '${dataimporter.delta.BASE_ID}'" 
            deletedPkQuery="delete from PK_DELETE_HISTORY output DELETED.PK AS BASE_ID where PK_NAME = 'BASE_ID'" >

        <field column="BASE_ID" name="id" />
        <field column="CATEGORY_ID" name="categoryId" />
        <field column="CATEGORY_NAME" name="category" />
        <field column="DESCRIPTION" name="description" />
        <field column="SOLUTION" name="solution" />
        <field column="USER_FULL_NAME" name="author" />
        <field column="SOFTWARE_VERSION" name="software_version" />
        <field column="TIMESTAMP" name="last_modified" />

        <entity name="keywords" dataSource="db" pk="KEYWORD_ID" query="select KNOWLEDGE_KEYWORDS.* from KNOWLEDGE_KEYWORDS_TO_BASE left join KNOWLEDGE_KEYWORDS on (KNOWLEDGE_KEYWORDS_TO_BASE.KEYWORD_ID = KNOWLEDGE_KEYWORDS.KEYWORD_ID) where  KNOWLEDGE_KEYWORDS_TO_BASE.BASE_ID = '${knowledge.BASE_ID}'">                
            <field column="KEYWORD_NAME" name="keywords" />              
        </entity>      
    </entity>        
</document>

It's also possible to add a language option to the JdbcDataSource url.

jdbc:sqlserver://localhost:1433;databaseName=XXX;responseBuffering=adaptive;selectMethod=cursor;language=XXX

I did not test this, but i assume this would also fix the problem if it had been set to english, because in the SQL server query the language is set to Norwegian, but the date format used in the where clause to compare the LAST_MODIFIED column was yyyy-MM-dd HH:mm:ss and the default format for Norwegian is yyyy-dd-MM HH:mm:ss.

I had same issue and figured out that deltaImportQuery is case sensitive

Made my id Column as "ID"

deltaImportQuery="select id,state,name,place,city from temp where ID='${dih.delta.ID}

Solr seems to save timestamps in dataimport.properties in UTC timezone, so you need to convert you timezone in database to UTC before compare to values in dataimport.properties.

e.g

-- for mysql, following would convert `update_date` to utc before compare in where clause
deltaQuery="select id from book where status = 0 and CONVERT_TZ(`update_date`, @@session.time_zone, '+00:00') &gt; '${dih.last_index_time}';"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!