问题
I need to split this document
<?xml version="1.0"?>
<!DOCTYPE docs SYSTEM "../rom11.dtd">
<docs>
<stwtext id="RD-10-00258" update="03.2011" seq="RQ-10-00001">
<head>
<ti>
<i>j</i>
</ti>
<ff-list>
<ff id="0103" />
</ff-list>
</head>
<p>
Symbol für die
<vw idref="RD-19-04447">Stromdichte</vw>
.
</p>
</stwtext>
<stwtext id="RD-10-00209" update="12.2007" seq="RQ-10-00223">
<head>
<ti>JZ</ti>
<ff-list>
<ff id="0932" />
</ff-list>
</head>
<p>
Abkürzung für Jod-Zahl, siehe
<vw idref="RD-06-00645">Fettkennzahlen</vw>
.
</p>
</stwtext>
</docs>
i do it with this command:
~> bin/mlcp.sh IMPORT -mode local -host localhost -port 15000 \
-username admin -password admin \
-input_file_path /media/sf_vm.shared/theme/rom-training/v10.new-ML.XML \
-output_uri_replace "/media/sf_vm.shared/theme/rom-training/keywords,'rom-data'" \
-output_collections rom-data \
-input_file_type aggregates -aggregate_record_element stwtext \
-aggregate_uri_id @id
The command works fine, but I see in MarkLogic the documents with ids, which don't belong to declared stwtext.id, but to the id of last element. For example, for my document I am expecting to see
RD-10-00258
RD-10-00260
but actually it looks like this:
0103
0932
Is it bug, or perhaps I did something wrong ? thanks
回答1:
It's a bug. If you'd like to, you can download the source code for MLCP and change it. Take a look at AggregateXMLReader.java's processStartElement()
.
来源:https://stackoverflow.com/questions/31120344/split-document-by-using-marklogic-mlcp