Using predicates on a large database?

三世轮回 提交于 2019-12-14 01:20:59

问题


I have a 50,000,000 document database that I'd like to write to a file the base-uri's for each document. Running the entire 50,000,000 is too long running (query times out). So, I thought I'd use predicates to break the database into more manageable batches. So, I tried the following to get a handle on its performance:

for $i in ( 49999000 to 50000000 )
return fn:base-uri( /mainDoc[position()=$i] )

But, performance was very slow for these 1000 base uris. In fact, the query timed out. I tried a similar query and got similar results (or lack of results):

for $i in ( /mainDoc ) [ 49999000 to 50000000 ]
return fn:base-uri( $i ) 

Is there a more performant method of looping through a large database, where documents at the end of the database are equally as quick to obtain as those at the beginning of the database?


回答1:


If you just want the document URIs, that easy. Ensure you have the document lexicon enabled and run a cts:uris() call.

To follow your approach to jump ahead in a document list to do something with each document, you can do the work unfiltered to make it fast:

for $item in cts:search(/mainDoc, cts:and-query(()), "unfiltered")[49999000 to 5000000]
return base-uri($item)

The cts:and-query(()) is a shortcut way to pass an always-true query.




回答2:


The most efficient way to use cts:uris would look something like this:

subsequence(cts:uris((), 'limit=50000000'), 49999000)

It would be even more efficient if you could pass in a start value, but that requires you to know the 49999000th value up-front.

cts:uris($start-value, 'limit=1000')

See http://docs.marklogic.com/cts:uris for more about that function.



来源:https://stackoverflow.com/questions/17685877/using-predicates-on-a-large-database

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!