how to read the multiple xml contents in a file using php

前端 未结 3 722
轻奢々
轻奢々 2021-01-26 10:28

I\'m dealing with this kind of XML sequence file can you any one suggest me to parse this:




        
3条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-26 10:30

    That file contains a sequence of XML documents concatenated to each other. You need to register a PHP streamwrapper that transparently divides the file for you, then you can process each document individually and even in a streaming fashion. Example:

    stream_wrapper_register('xmlseq', 'XMLSequenceStream');
    
    $path = "xmlseq://zip://ipg140107.zip#ipg140107.xml";
    
    while (XMLSequenceStream::notAtEndOfSequence($path)) {
        $reader = new XMLReader();
        $reader->open($path);
        // just consume the whole document
        while ($reader::next()) {
            XMLReaderNode::dump($reader);
        }
    }
    
    XMLSequenceStream::clean();    
    

    That stream-wrapper is part of the XMLReaderIterator library and works as well with SimpleXMLElement or DOMDocument albeit for larger files XMLReader is a better fit.

    For the file I've taken in my example (http://storage.googleapis.com/patents/grant_full_text/2014/ipg140107.zip from https://www.google.com/googlebooks/uspto-patents-grants-text.html), the overall element-structure counting elements of the different trees in that sequence for example is:

    \-us-patent-grant (473)
      |-us-bibliographic-data-grant (473)
      | |-publication-reference (473)
      | | \-document-id (473)
      | |   |-country (473)
      | |   |-doc-number (473)
      | |   |-kind (473)
      | |   \-date (473)
      | |-application-reference (473)
      | | \-document-id (473)
      | |   |-country (473)
      | |   |-doc-number (473)
      | |   \-date (473)
      | |-us-application-series-code (473)
      | |-us-term-of-grant (470)
      | | |-length-of-grant (450)
      | | |-disclaimer (18)
      | | | \-text (18)
      | | \-us-term-extension (20)
      | |-classification-locarno (450)
      | | |-edition (450)
      | | \-main-classification (450)
      | |-classification-national (473)
      | | |-country (473)
      | | |-main-classification (473)
      | | \-further-classification (143)
      | |-invention-title (473)
      | | \-i (12)
      | |-us-references-cited (458)
      | | \-us-citation (11000)
      | |   |-patcit (10265)
      | |   | \-document-id (10265)
      | |   |   |-country (10265)
      | |   |   |-doc-number (10265)
      | |   |   |-kind (9884)
      | |   |   |-name (9811)
      | |   |   \-date (10264)
      | |   |-category (10999)
      | |   |-classification-national (6309)
      | |   | |-country (6309)
      | |   | \-main-classification (6309)
      | |   |-nplcit (735)
      | |   | \-othercit (735)
      | |   |   |-sub (281)
      | |   |   |-i (7)
      | |   |   \-sup (1)
      | |   \-classification-cpc-text (1)
      | |-number-of-claims (472)
      | |-us-exemplary-claim (472)
      | |-us-field-of-classification-search (472)
      | | \-classification-national (8991)
      | |   |-country (8991)
      | |   |-main-classification (8991)
      | |   \-additional-info (1205)
      | |-figures (472)
      | | |-number-of-drawing-sheets (472)
      | | \-number-of-figures (472)
      | |-us-parties (472)
      | | |-us-applicants (472)
      | | | \-us-applicant (765)
      | | |   |-addressbook (765)
      | | |   | |-last-name (573)
      | | |   | |-first-name (573)
      | | |   | |-address (765)
      | | |   | | |-city (765)
      | | |   | | |-country (765)
      | | |   | | \-state (423)
      | | |   | \-orgname (192)
      | | |   \-residence (765)
      | | |     \-country (765)
      | | |-inventors (472)
      | | | \-inventor (969)
      | | |   \-addressbook (969)
      | | |     |-last-name (969)
      | | |     |-first-name (969)
      | | |     \-address (969)
      | | |       |-city (969)
      | | |       |-country (969)
      | | |       \-state (519)
      | | \-agents (429)
      | |   \-agent (500)
      | |     \-addressbook (500)
      | |       |-orgname (361)
      | |       |-address (500)
      | |       | \-country (500)
      | |       |-last-name (139)
      | |       \-first-name (139)
      | |-assignees (385)
      | | \-assignee (391)
      | |   |-addressbook (390)
      | |   | |-orgname (386)
      | |   | |-role (390)
      | |   | |-address (390)
      | |   | | |-city (355)
      | |   | | |-country (390)
      | |   | | \-state (192)
      | |   | |-last-name (4)
      | |   | \-first-name (4)
      | |   |-orgname (1)
      | |   \-role (1)
      | |-examiners (472)
      | | |-primary-examiner (472)
      | | | |-last-name (472)
      | | | |-first-name (472)
      | | | \-department (472)
      | | \-assistant-examiner (65)
      | |   |-last-name (65)
      | |   \-first-name (65)
      | |-us-related-documents (65)
      | | |-continuation-in-part (16)
      | | | \-relation (16)
      | | |   |-parent-doc (16)
      | | |   | |-document-id (16)
      | | |   | | |-country (16)
      | | |   | | |-doc-number (16)
      | | |   | | \-date (16)
      | | |   | |-parent-status (11)
      | | |   | \-parent-grant-document (5)
      | | |   |   \-document-id (5)
      | | |   |     |-country (5)
      | | |   |     |-doc-number (5)
      | | |   |     \-date (2)
      | | |   \-child-doc (16)
      | | |     \-document-id (16)
      | | |       |-country (16)
      | | |       \-doc-number (16)
      | | |-continuation (21)
      | | | \-relation (21)
      | | |   |-parent-doc (21)
      | | |   | |-document-id (21)
      | | |   | | |-country (21)
      | | |   | | |-doc-number (21)
      | | |   | | \-date (21)
      | | |   | |-parent-status (16)
      | | |   | \-parent-grant-document (5)
      | | |   |   \-document-id (5)
      | | |   |     |-country (5)
      | | |   |     |-doc-number (5)
      | | |   |     \-date (2)
      | | |   \-child-doc (21)
      | | |     \-document-id (21)
      | | |       |-country (21)
      | | |       \-doc-number (21)
      | | |-division (32)
      | | | \-relation (32)
      | | |   |-parent-doc (32)
      | | |   | |-document-id (32)
      | | |   | | |-country (32)
      | | |   | | |-doc-number (32)
      | | |   | | \-date (32)
      | | |   | |-parent-grant-document (24)
      | | |   | | \-document-id (24)
      | | |   | |   |-country (24)
      | | |   | |   |-doc-number (24)
      | | |   | |   \-date (1)
      | | |   | \-parent-status (8)
      | | |   \-child-doc (32)
      | | |     \-document-id (32)
      | | |       |-country (32)
      | | |       \-doc-number (32)
      | | \-related-publication (9)
      | |   \-document-id (9)
      | |     |-country (9)
      | |     |-doc-number (9)
      | |     |-kind (9)
      | |     \-date (9)
      | |-priority-claims (140)
      | | \-priority-claim (182)
      | |   |-country (182)
      | |   |-doc-number (182)
      | |   \-date (182)
      | |-us-sir-flag (1)
      | |-classifications-ipcr (23)
      | | \-classification-ipcr (24)
      | |   |-ipc-version-indicator (24)
      | |   | \-date (24)
      | |   |-classification-level (24)
      | |   |-section (24)
      | |   |-class (24)
      | |   |-subclass (24)
      | |   |-main-group (24)
      | |   |-subgroup (24)
      | |   |-symbol-position (24)
      | |   |-classification-value (24)
      | |   |-action-date (24)
      | |   | \-date (24)
      | |   |-generating-office (24)
      | |   | \-country (24)
      | |   |-classification-status (24)
      | |   \-classification-data-source (24)
      | |-us-botanic (21)
      | | |-latin-name (21)
      | | \-variety (21)
      | \-classifications-cpc (1)
      |   \-main-cpc (1)
      |     \-classification-cpc (1)
      |       |-cpc-version-indicator (1)
      |       | \-date (1)
      |       |-section (1)
      |       |-class (1)
      |       |-subclass (1)
      |       |-main-group (1)
      |       |-subgroup (1)
      |       |-symbol-position (1)
      |       |-classification-value (1)
      |       |-action-date (1)
      |       | \-date (1)
      |       |-generating-office (1)
      |       | \-country (1)
      |       |-classification-status (1)
      |       |-classification-data-source (1)
      |       \-scheme-origination-code (1)
      |-drawings (472)
      | \-figure (3033)
      |   \-img (3033)
      |-description (472)
      | |-description-of-drawings (472)
      | | |-p (3955)
      | | | |-figref (4478)
      | | | |-b (86)
      | | | \-i (6)
      | | \-heading (22)
      | |-heading (162)
      | \-p (340)
      |   |-figref (15)
      |   |-b (250)
      |   |-i (146)
      |   |-ul (96)
      |   | \-li (444)
      |   |   |-ul (215)
      |   |   | \-li (273)
      |   |   |   |-ul (199)
      |   |   |   | \-li (1192)
      |   |   |   |   |-i (1219)
      |   |   |   |   |-b (1)
      |   |   |   |   |-sup (10)
      |   |   |   |   \-sub (2)
      |   |   |   \-i (11)
      |   |   |-sup (2)
      |   |   \-i (26)
      |   |-tables (15)
      |   | \-table (15)
      |   |   \-tgroup (49)
      |   |     |-colspec (175)
      |   |     |-thead (15)
      |   |     | \-row (27)
      |   |     |   \-entry (51)
      |   |     \-tbody (49)
      |   |       \-row (291)
      |   |         \-entry (997)
      |   |           \-sup (28)
      |   \-sup (2)
      |-us-claim-statement (472)
      |-claims (472)
      | \-claim (476)
      |   \-claim-text (476)
      |     |-figref (1)
      |     |-claim-text (5)
      |     |-claim-ref (4)
      |     \-i (15)
      \-abstract (22)
        \-p (22)
          |-i (27)
          \-ul (2)
            \-li (2)
              \-ul (2)
                \-li (11)
    

提交回复
热议问题