问题
I have one entity in MarkLogic under which around 98k+ documents (/someEntity/[ID].xml
) are present and I have one situation in which I have to add a few new tags in all those documents.
I prepared a query to do add child node and then try to run against that entity receiving expanded tree cache full. I increased cache memory to few more gigs and it works and takes a long time to complete. Also tried with xdmp:clear-expanded-tree-cache()
and it also won't work.
Any pointers how we can fetch the URL's in the chunks of 10k and process so it won't spike the memory and won't throw an error after some time of query processing.
回答1:
Hitting expanded tree cache sounds like you are holding the full result set somewhere, which sounds unnecessary. There might be ways to make your code smarter, and have it stream through the results, and forget about things as soon as possible. As a rule of thumb for this: don't assign complete result sets to let statements.
However, sometimes it is easier to just batch up the work. Corb, as suggested by Michael Gardner is an excellent choice for this. It can throttle the load on MarkLogic from outside, and pace it down if needed.
For smaller tasks like this something like taskbot might do the trick as well, though it is harder to control its pace.
HTH!
来源:https://stackoverflow.com/questions/59055764/how-to-process-large-number-of-documents-in-chunk-to-avoid-expanded-tree-cache-f