Indexing huge table with Hibernate Search

好久不见. 提交于 2019-12-23 05:32:05

问题


I'm trying to add Hibernate Search to my project to improve search performance, but I have problem with Indexing huge tables. I've added Hibernate Search dependency and I have simple servlet where I trigger indexing process:

    FullTextEntityManager ftem = Search.getFullTextEntityManager(em);
    try {
        ftem
        .createIndexer(MyEntity.class)
        .batchSizeToLoadObjects(25)
        .cacheMode(CacheMode.NORMAL)
        .threadsToLoadObjects(5)
        .startAndWait();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }

and in my persistance.xml:

    <property name="hibernate.show_sql" value="false" />
    <property name="hibernate.dialect" value="org.hibernate.dialect.MySQL5InnoDBDialect" />
    <property name="hibernate.archive.autodetection" value="class" />
    <property name="hibernate.search.default.directory_provider" value="filesystem" />
    <property name="hibernate.search.default.indexBase" value="/var/lucene/indexes" />

The problem is that MyEntity table has around 25 milion rows and after about 30seconds I get out of memory error messages:

2015-07-28 21:16:50,168 INFO  [stdout] (default task-60) Building index

2015-07-28 21:16:55,180 INFO  [org.hibernate.search.impl.SimpleIndexingProgressMonitor] (Hibernate Search: identifierloader-1) HSEARCH000027: Going to reindex 22593085 entities
2015-07-28 21:19:47,186 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 2) WFLYCTL0013: Operation ("read-children-resources") failed - address: ([]): java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-28 21:19:58,506 WARN  [org.jboss.jca.core.connectionmanager.listener.TxConnectionListener] (Hibernate Search: identifierloader-1) IJ000305: Connection error occured: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@15a020a3[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@446189fe connection handles=1 lastReturned=1438110947536 lastValidated=1438108373971 lastCheckedOut=1438111010224 trackByTx=true pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@3fb3ab95 mcp=SemaphoreArrayListManagedConnectionPool@496e4f29[pool=MyProjectApiDS] xaResource=LocalXAResourceImpl@4f676ce7[connectionListener=15a020a3 connectionManager=798378ab warned=false currentXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8010b:537a5b28:55b7cad0:167, node_name=1, branch_uid=0:ffffc0a8010b:537a5b28:55b7cad0:169, subordinatenodename=null, eis_name=java:/MyProjectApiDS > productName=MySQL productVersion=5.6.25-log jndiName=java:/MyProjectApiDS] txSync=null]: javax.resource.spi.ResourceAdapterInternalException: Unexpected error
    at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.broadcastConnectionError(BaseWrapperManagedConnection.java:699)
    at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.connectionError(BaseWrapperManagedConnection.java:665)
    at org.jboss.jca.adapters.jdbc.WrappedConnection.checkException(WrappedConnection.java:1669)
    at org.jboss.jca.adapters.jdbc.WrappedStatement.checkException(WrappedStatement.java:1267)
    at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:467)
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:82)
    at org.hibernate.loader.Loader.getResultSet(Loader.java:2066)
    at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1863)
    at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1839)
    at org.hibernate.loader.Loader.scroll(Loader.java:2627)
    at org.hibernate.loader.criteria.CriteriaLoader.scroll(CriteriaLoader.java:121)
    at org.hibernate.internal.StatelessSessionImpl.scroll(StatelessSessionImpl.java:682)
    at org.hibernate.internal.CriteriaImpl.scroll(CriteriaImpl.java:394)
    at org.hibernate.search.batchindexing.impl.IdentifierProducer.loadAllIdentifiers(IdentifierProducer.java:146)
    at org.hibernate.search.batchindexing.impl.IdentifierProducer.inTransactionWrapper(IdentifierProducer.java:111)
    at org.hibernate.search.batchindexing.impl.IdentifierProducer.run(IdentifierProducer.java:95)
    at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.runWithErrorHandler(OptionallyWrapInJTATransaction.java:97)
    at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:49)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-28 21:19:58,514 ERROR [org.jboss.remoting.remote.connection] (XNIO-1 I/O-1) JBREM000200: Remote connection failed: java.io.IOException: Istniejące połączenie zostało gwałtownie zamknięte przez zdalnego hosta
2015-07-28 21:19:58,531 INFO  [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 2) WFLYDS0019: Deployment mysql-connector-java-5.1.34-bin.jar was previously deployed by this scanner but has been removed from the server deployment list by another management tool. Marker file C:\servers\wildfly-9.0.0.Final\standalone\deployments\mysql-connector-java-5.1.34-bin.jar.undeployed is being added to record this fact.
2015-07-28 21:19:58,620 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Hibernate Search: identifierloader-1) SQL Error: 0, SQLState: null
2015-07-28 21:19:58,621 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Hibernate Search: identifierloader-1) Error
2015-07-28 21:19:58,622 ERROR [org.hibernate.search.exception.impl.LogErrorHandler] (Hibernate Search: identifierloader-1) HSEARCH000058: HSEARCH000116: Unexpected error during MassIndexer operation: org.hibernate.exception.GenericJDBCException: could not extract ResultSet
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:126)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:112)
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:91)
    at org.hibernate.loader.Loader.getResultSet(Loader.java:2066)
    at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1863)
    at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1839)
    at org.hibernate.loader.Loader.scroll(Loader.java:2627)
    at org.hibernate.loader.criteria.CriteriaLoader.scroll(CriteriaLoader.java:121)
    at org.hibernate.internal.StatelessSessionImpl.scroll(StatelessSessionImpl.java:682)
    at org.hibernate.internal.CriteriaImpl.scroll(CriteriaImpl.java:394)
    at org.hibernate.search.batchindexing.impl.IdentifierProducer.loadAllIdentifiers(IdentifierProducer.java:146)
    at org.hibernate.search.batchindexing.impl.IdentifierProducer.inTransactionWrapper(IdentifierProducer.java:111)
    at org.hibernate.search.batchindexing.impl.IdentifierProducer.run(IdentifierProducer.java:95)
    at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.runWithErrorHandler(OptionallyWrapInJTATransaction.java:97)
    at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:49)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Error
    at org.jboss.jca.adapters.jdbc.WrappedConnection.checkException(WrappedConnection.java:1677)
    at org.jboss.jca.adapters.jdbc.WrappedStatement.checkException(WrappedStatement.java:1267)
    at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:467)
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:82)
    ... 15 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-28 21:19:58,667 INFO  [org.hibernate.search.impl.SimpleIndexingProgressMonitor] (default task-60) HSEARCH000028: Reindexed 22593085 entities
2015-07-28 21:19:58,673 WARN  [com.arjuna.ats.jta] (Hibernate Search: identifierloader-1) ARJUNA016031: XAOnePhaseResource.rollback for < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8010b:537a5b28:55b7cad0:167, node_name=1, branch_uid=0:ffffc0a8010b:537a5b28:55b7cad0:169, subordinatenodename=null, eis_name=java:/MyProjectApiDS > failed with exception: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001160: Could not rollback local transaction
    at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.rollback(LocalXAResourceImpl.java:253)
    at com.arjuna.ats.internal.jta.resources.arjunacore.XAOnePhaseResource.rollback(XAOnePhaseResource.java:205)
    at com.arjuna.ats.internal.arjuna.abstractrecords.LastResourceRecord.topLevelAbort(LastResourceRecord.java:126)
    at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2993)
    at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2972)
    at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1675)
    at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:127)
    at com.arjuna.ats.arjuna.AtomicAction.abort(AtomicAction.java:186)
    at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.rollbackAndDisassociate(TransactionImple.java:1282)
    at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.rollback(BaseTransaction.java:143)
    at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.rollback(BaseTransactionManagerDelegate.java:114)
    at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.cleanUpOnError(OptionallyWrapInJTATransaction.java:123)
    at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:54)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.jboss.jca.core.spi.transaction.local.LocalResourceException: No operations allowed after connection closed.
    at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.rollback(LocalManagedConnection.java:139)
    at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.rollback(LocalXAResourceImpl.java:248)
    ... 15 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
    at com.mysql.jdbc.Util.getInstance(Util.java:360)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:935)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:870)
    at com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1232)
    at com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1225)
    at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4568)
    at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.rollback(LocalManagedConnection.java:132)
    ... 16 more

So the question is, how to index huge tables automatically?


回答1:


Which version of Hibernate Search are you using. If you are using the latest 5.4 release, you can actually configure the transaction timeout just for the indexing. Something like this:

fullTextSession
 .createIndexer( User.class )
 .batchSizeToLoadObjects( 25 )
 .cacheMode( CacheMode.NORMAL )
 .threadsToLoadObjects( 12 )
 .idFetchSize( 150 )
 .transactionTimeout( 1800 )
 .startAndWait();

If you can, I would recommend using the latest version.




回答2:


MySQL's JDBC driver developers made some nasty decisions; you need to force it to not try to load all of the database in memory but actually use lazy paging as Hibernate is asking it to do, by setting the JDBC fetch size to Integer.MIN_VALUE.

ftem
    .createIndexer(MyEntity.class)
    .batchSizeToLoadObjects(25)
    .cacheMode(CacheMode.NORMAL)
    .idFetchSize(Integer.MIN_VALUE) // Important on MySQL!
    .transactionTimeout(timeout) //also useful
    .threadsToLoadObjects(5)
    .startAndWait();



回答3:


According to stack trace that you have provided I guess that problem is in transaction that you are using is getting timed out. Increase timeout setting in configuration and try again but this is not recommended as, increasing default timeout will get apply to transaction used in throughout the application.

If increasing timeout helped you out then you should try out some other approaches like ScrollMode or batch processing.

Consider this post, I hope this will help.




回答4:


I've always had problems with startAndWait(). The most recent version is good but still has some issues. Setting the transaction timeout high is sometimes not feasible with a really large dataset.

One approach you could try is to fetch the results in batches. It doesn't have the same integrity so you might miss new records, but it won't time out.

The code would look something like this:

int offset = 0;
int batchSize = 500;
boolean indexComplete = false;
while (!indexComplete) {
    utx.begin();
    FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
    TypedQuery<User> query = fullTextEntityManager.createQuery("SELECT u FROM User u", User.class);
    query.setFirstResult(offset);
    query.setMaxResults(batchSize);

    LOGGER.info("Indexing {} users from offset {}", batchSize, offset);
    List<User> results = query.getResultList();
    if (results == null || results.isEmpty()) {
        indexComplete = true;
    } else {
        offset += results.size();
        for (User user : results) {
            fullTextEntityManager.index(user);
        }
    }
    utx.commit();
}
LOGGER.info("Indexed {} objects", offset);


来源:https://stackoverflow.com/questions/31685101/indexing-huge-table-with-hibernate-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!