How to process large number of documents in chunk to avoid expanded tree cache full

淺唱寂寞╮ 提交于 2020-03-25 13:43:24

问题


I have one entity in MarkLogic under which around 98k+ documents (/someEntity/[ID].xml) are present and I have one situation in which I have to add a few new tags in all those documents.

I prepared a query to do add child node and then try to run against that entity receiving expanded tree cache full. I increased cache memory to few more gigs and it works and takes a long time to complete. Also tried with xdmp:clear-expanded-tree-cache() and it also won't work.

Any pointers how we can fetch the URL's in the chunks of 10k and process so it won't spike the memory and won't throw an error after some time of query processing.


回答1:


Hitting expanded tree cache sounds like you are holding the full result set somewhere, which sounds unnecessary. There might be ways to make your code smarter, and have it stream through the results, and forget about things as soon as possible. As a rule of thumb for this: don't assign complete result sets to let statements.

However, sometimes it is easier to just batch up the work. Corb, as suggested by Michael Gardner is an excellent choice for this. It can throttle the load on MarkLogic from outside, and pace it down if needed.

For smaller tasks like this something like taskbot might do the trick as well, though it is harder to control its pace.

HTH!



来源:https://stackoverflow.com/questions/59055764/how-to-process-large-number-of-documents-in-chunk-to-avoid-expanded-tree-cache-f

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!