Need help rewriting XQuery to avoid expanded tree cache full error in MarkLogic

问题

I am new to XQuery and MarkLogic. I am trying to update documents in MarkLogic and get the extended tree cache full error. Just to get the work done I have increased the expanded tree cache but that is not recommended. I would like to tune this query so that it does not need to simultaneously cache as much XML.

Here is my query enter image description here

I have uploaded my query as an image because it was not so pretty when I pasted it on the editor. If any one knows a better way please suggest.

Thanks in advance.

回答1:

I've just solved exactly this scenario. There are two things I did

I put the node-replace and node-insert type calls (that is any calls that modify the XML structure into a separate module and then called that module using xdmp:invoke, passing in any parameters required, like this

let $update := xdmp:invoke("/app/lib/update-attribute-node.xqy", (xs:QName("newValue"), $new), {xdmp:modules-database()})

The reason why this works is that the call to xdmp:invoke happens in it's own transaction and once it completes, the memory is cleared up. If you don't do this then, each time you call the update or insert function, it will not actually do the write, until the end in a single transaction meaning your memory will fill up pretty quickly.

Any time I needed to loop over paths in MarkLogic (or documents or whatever they are called - I've only been using MarkLogic for a few days) and there are a large number of them I processed them only a few at a time like below. I came up with an elaborate way of skipping and taking only a batch of documents at a time, but you can do it in any number of ways.

let $whatever:= xdmp:directory("/whatever/")[$start to $end]

I also put this into a separate module so that it is processed immediately and not in a single transaction.

Putting all expensive calls into separate modules and taking only a subset of large data sets at a time helped me solve my expanded tree cache full errors.

回答2:

Expanded tree cache errors can be caused by executing queries that select too many XML nodes at once. In your example, this is likely the culprit: /tx:AttVal[tx:AttributeName/text()=$attributeName].

It's possible that calling text() is the source of your problem (and text() probably not what you mean anyway - see this blog), causing MarkLogic to evaluate that function on all these nodes, and that by simply using /tx:AttVal[tx:AttributeName=$attributeName] it may solve your problem.

Next I would consider an adding a path range index on /tx:AttVal/tx:AttributeName and query those nodes using cts:search and cts:path-range-query. This will be substantially faster than just XPath without a range index. It's also possible to use XPath with a range index: MarkLogic will automatically optimize the XPath expression to use the range index; however, there can be reasons it doesn't optimize the expression correctly, and you would want to check that using xdmp:plan.

Also note that the general best practice recommendation for XML in MarkLogic is to use "semantic XML". E.g., when you mean an attribute, use an attribute: <some-node AttributeName=AttVal>. MarkLogic's indexes are optimized out of the box for semantic XML design. However, if you don't have an option but to work with XML that's not, then that's what path range indexes were designed for.

来源：https://stackoverflow.com/questions/23016289/need-help-rewriting-xquery-to-avoid-expanded-tree-cache-full-error-in-marklogic

标签

caching

xquery

marklogic