I\'m dealing with this kind of XML sequence file can you any one suggest me to parse this:
That file contains a sequence of XML documents concatenated to each other. You need to register a PHP streamwrapper that transparently divides the file for you, then you can process each document individually and even in a streaming fashion. Example:
stream_wrapper_register('xmlseq', 'XMLSequenceStream');
$path = "xmlseq://zip://ipg140107.zip#ipg140107.xml";
while (XMLSequenceStream::notAtEndOfSequence($path)) {
$reader = new XMLReader();
$reader->open($path);
// just consume the whole document
while ($reader::next()) {
XMLReaderNode::dump($reader);
}
}
XMLSequenceStream::clean();
That stream-wrapper is part of the XMLReaderIterator library and works as well with SimpleXMLElement or DOMDocument albeit for larger files XMLReader is a better fit.
For the file I've taken in my example (http://storage.googleapis.com/patents/grant_full_text/2014/ipg140107.zip from https://www.google.com/googlebooks/uspto-patents-grants-text.html), the overall element-structure counting elements of the different trees in that sequence for example is:
\-us-patent-grant (473)
|-us-bibliographic-data-grant (473)
| |-publication-reference (473)
| | \-document-id (473)
| | |-country (473)
| | |-doc-number (473)
| | |-kind (473)
| | \-date (473)
| |-application-reference (473)
| | \-document-id (473)
| | |-country (473)
| | |-doc-number (473)
| | \-date (473)
| |-us-application-series-code (473)
| |-us-term-of-grant (470)
| | |-length-of-grant (450)
| | |-disclaimer (18)
| | | \-text (18)
| | \-us-term-extension (20)
| |-classification-locarno (450)
| | |-edition (450)
| | \-main-classification (450)
| |-classification-national (473)
| | |-country (473)
| | |-main-classification (473)
| | \-further-classification (143)
| |-invention-title (473)
| | \-i (12)
| |-us-references-cited (458)
| | \-us-citation (11000)
| | |-patcit (10265)
| | | \-document-id (10265)
| | | |-country (10265)
| | | |-doc-number (10265)
| | | |-kind (9884)
| | | |-name (9811)
| | | \-date (10264)
| | |-category (10999)
| | |-classification-national (6309)
| | | |-country (6309)
| | | \-main-classification (6309)
| | |-nplcit (735)
| | | \-othercit (735)
| | | |-sub (281)
| | | |-i (7)
| | | \-sup (1)
| | \-classification-cpc-text (1)
| |-number-of-claims (472)
| |-us-exemplary-claim (472)
| |-us-field-of-classification-search (472)
| | \-classification-national (8991)
| | |-country (8991)
| | |-main-classification (8991)
| | \-additional-info (1205)
| |-figures (472)
| | |-number-of-drawing-sheets (472)
| | \-number-of-figures (472)
| |-us-parties (472)
| | |-us-applicants (472)
| | | \-us-applicant (765)
| | | |-addressbook (765)
| | | | |-last-name (573)
| | | | |-first-name (573)
| | | | |-address (765)
| | | | | |-city (765)
| | | | | |-country (765)
| | | | | \-state (423)
| | | | \-orgname (192)
| | | \-residence (765)
| | | \-country (765)
| | |-inventors (472)
| | | \-inventor (969)
| | | \-addressbook (969)
| | | |-last-name (969)
| | | |-first-name (969)
| | | \-address (969)
| | | |-city (969)
| | | |-country (969)
| | | \-state (519)
| | \-agents (429)
| | \-agent (500)
| | \-addressbook (500)
| | |-orgname (361)
| | |-address (500)
| | | \-country (500)
| | |-last-name (139)
| | \-first-name (139)
| |-assignees (385)
| | \-assignee (391)
| | |-addressbook (390)
| | | |-orgname (386)
| | | |-role (390)
| | | |-address (390)
| | | | |-city (355)
| | | | |-country (390)
| | | | \-state (192)
| | | |-last-name (4)
| | | \-first-name (4)
| | |-orgname (1)
| | \-role (1)
| |-examiners (472)
| | |-primary-examiner (472)
| | | |-last-name (472)
| | | |-first-name (472)
| | | \-department (472)
| | \-assistant-examiner (65)
| | |-last-name (65)
| | \-first-name (65)
| |-us-related-documents (65)
| | |-continuation-in-part (16)
| | | \-relation (16)
| | | |-parent-doc (16)
| | | | |-document-id (16)
| | | | | |-country (16)
| | | | | |-doc-number (16)
| | | | | \-date (16)
| | | | |-parent-status (11)
| | | | \-parent-grant-document (5)
| | | | \-document-id (5)
| | | | |-country (5)
| | | | |-doc-number (5)
| | | | \-date (2)
| | | \-child-doc (16)
| | | \-document-id (16)
| | | |-country (16)
| | | \-doc-number (16)
| | |-continuation (21)
| | | \-relation (21)
| | | |-parent-doc (21)
| | | | |-document-id (21)
| | | | | |-country (21)
| | | | | |-doc-number (21)
| | | | | \-date (21)
| | | | |-parent-status (16)
| | | | \-parent-grant-document (5)
| | | | \-document-id (5)
| | | | |-country (5)
| | | | |-doc-number (5)
| | | | \-date (2)
| | | \-child-doc (21)
| | | \-document-id (21)
| | | |-country (21)
| | | \-doc-number (21)
| | |-division (32)
| | | \-relation (32)
| | | |-parent-doc (32)
| | | | |-document-id (32)
| | | | | |-country (32)
| | | | | |-doc-number (32)
| | | | | \-date (32)
| | | | |-parent-grant-document (24)
| | | | | \-document-id (24)
| | | | | |-country (24)
| | | | | |-doc-number (24)
| | | | | \-date (1)
| | | | \-parent-status (8)
| | | \-child-doc (32)
| | | \-document-id (32)
| | | |-country (32)
| | | \-doc-number (32)
| | \-related-publication (9)
| | \-document-id (9)
| | |-country (9)
| | |-doc-number (9)
| | |-kind (9)
| | \-date (9)
| |-priority-claims (140)
| | \-priority-claim (182)
| | |-country (182)
| | |-doc-number (182)
| | \-date (182)
| |-us-sir-flag (1)
| |-classifications-ipcr (23)
| | \-classification-ipcr (24)
| | |-ipc-version-indicator (24)
| | | \-date (24)
| | |-classification-level (24)
| | |-section (24)
| | |-class (24)
| | |-subclass (24)
| | |-main-group (24)
| | |-subgroup (24)
| | |-symbol-position (24)
| | |-classification-value (24)
| | |-action-date (24)
| | | \-date (24)
| | |-generating-office (24)
| | | \-country (24)
| | |-classification-status (24)
| | \-classification-data-source (24)
| |-us-botanic (21)
| | |-latin-name (21)
| | \-variety (21)
| \-classifications-cpc (1)
| \-main-cpc (1)
| \-classification-cpc (1)
| |-cpc-version-indicator (1)
| | \-date (1)
| |-section (1)
| |-class (1)
| |-subclass (1)
| |-main-group (1)
| |-subgroup (1)
| |-symbol-position (1)
| |-classification-value (1)
| |-action-date (1)
| | \-date (1)
| |-generating-office (1)
| | \-country (1)
| |-classification-status (1)
| |-classification-data-source (1)
| \-scheme-origination-code (1)
|-drawings (472)
| \-figure (3033)
| \-img (3033)
|-description (472)
| |-description-of-drawings (472)
| | |-p (3955)
| | | |-figref (4478)
| | | |-b (86)
| | | \-i (6)
| | \-heading (22)
| |-heading (162)
| \-p (340)
| |-figref (15)
| |-b (250)
| |-i (146)
| |-ul (96)
| | \-li (444)
| | |-ul (215)
| | | \-li (273)
| | | |-ul (199)
| | | | \-li (1192)
| | | | |-i (1219)
| | | | |-b (1)
| | | | |-sup (10)
| | | | \-sub (2)
| | | \-i (11)
| | |-sup (2)
| | \-i (26)
| |-tables (15)
| | \-table (15)
| | \-tgroup (49)
| | |-colspec (175)
| | |-thead (15)
| | | \-row (27)
| | | \-entry (51)
| | \-tbody (49)
| | \-row (291)
| | \-entry (997)
| | \-sup (28)
| \-sup (2)
|-us-claim-statement (472)
|-claims (472)
| \-claim (476)
| \-claim-text (476)
| |-figref (1)
| |-claim-text (5)
| |-claim-ref (4)
| \-i (15)
\-abstract (22)
\-p (22)
|-i (27)
\-ul (2)
\-li (2)
\-ul (2)
\-li (11)