How to extract information after a node in XML with Python?

…衆ロ難τιáo~ 提交于 2021-01-28 12:11:54

问题


I have the following XML structure (very large file, many more person entries)

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE population SYSTEM "http://www.matsim.org/files/dtd/population_v6.dtd">

<population desc="Switzerland Baseline">

    <attributes>
        <attribute name="coordinateReferenceSystem" class="java.lang.String" >Atlantis</attribute>
    </attributes>


<!-- ====================================================================== -->

    <person id="10">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >30</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="home_x" class="java.lang.Double" >2679482.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1237545.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >374775</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >281604</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000137</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240012081086</attribute>
        </attributes>

        <plan score="-8.842222222222222" selected="yes">
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" end_time="07:50:56" >
            </activity>
            <leg mode="access_walk" dep_time="07:50:56" trav_time="00:03:20">
                <route type="generic" start_link="270549" end_link="617713" trav_time="00:03:20" distance="239.83275324790645"></route>
            </leg>
            <activity type="pt interaction" link="617713" x="2679299.97008475" y="1237575.0077440983" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="07:54:16" trav_time="00:47:44">
                <route type="enriched_pt" start_link="617713" end_link="586404" trav_time="00:47:44" distance="15802.787558964774">{"inVehicleTime":1980.0,"transferTime":884.0,"accessStopIndex":2,"egressStopindex":21,"transitRouteId":"07656_013","transitLineId":"PAG_line235","departureId":"77141"}</route>
            </leg>
            <activity type="pt interaction" link="586404" x="2681990.0107938214" y="1247298.9705903793" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="08:42:00" trav_time="00:00:59">
                <route type="generic" start_link="586404" end_link="4222" trav_time="00:00:59" distance="71.92245024668337"></route>
            </leg>
            <activity type="pt interaction" link="4222" x="2681934.8161827456" y="1247302.7661533705" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="08:42:59" trav_time="00:12:00">
                <route type="enriched_pt" start_link="4222" end_link="955504" trav_time="00:12:00" distance="2958.2459797004594">{"inVehicleTime":300.0,"transferTime":420.06462479443144,"accessStopIndex":6,"egressStopindex":7,"transitRouteId":"20420_001","transitLineId":"SBB_S24_8502204-8506000","departureId":"06294"}</route>
            </leg>
            <activity type="pt interaction" link="955504" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="08:55:00" trav_time="00:00:00">
                <route type="generic" start_link="955504" end_link="955475" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="pt interaction" link="955475" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="08:55:00" trav_time="00:35:00">
                <route type="enriched_pt" start_link="955475" end_link="771011" trav_time="00:35:00" distance="9555.56835806127">{"inVehicleTime":900.0,"transferTime":1200.0,"accessStopIndex":3,"egressStopindex":9,"transitRouteId":"19621_001","transitLineId":"SBB_S16_8503016-8503103","departureId":"06152"}</route>
            </leg>
            <activity type="pt interaction" link="771011" x="2687180.6471416447" y="1240073.3528400902" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="09:30:00" trav_time="00:00:26">
                <route type="generic" start_link="771011" end_link="166874" trav_time="00:00:26" distance="30.317917906969893"></route>
            </leg>
            <activity type="outside" link="166874" facility="outside_1" x="2687164.597205863" y="1240056.4327108893" end_time="08:20:56" >
            </activity>
            <leg mode="outside" dep_time="08:20:56" trav_time="00:00:00">
                <route type="generic" start_link="166874" end_link="166874" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="outside" link="166874" facility="outside_1" x="2687164.597205863" y="1240056.4327108893" end_time="08:44:00" >
            </activity>
            <leg mode="transit_walk" dep_time="08:44:00" trav_time="00:18:01">
                <route type="generic" start_link="166874" end_link="978218" trav_time="00:18:01" distance="1297.9935065965901"></route>
            </leg>
            <activity type="outside" link="978218" facility="outside_2" x="2688162.5789000895" y="1240087.2224383662" end_time="08:59:00" >
            </activity>
            <leg mode="access_walk" dep_time="08:59:00" trav_time="00:17:44">
                <route type="generic" start_link="978218" end_link="771010" trav_time="00:17:44" distance="1276.6386181523276"></route>
            </leg>
            <activity type="pt interaction" link="771010" x="2687180.6471416447" y="1240073.3528400902" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="09:16:44" trav_time="00:25:16">
                <route type="enriched_pt" start_link="771010" end_link="955474" trav_time="00:25:16" distance="9742.201043728513">{"inVehicleTime":960.0,"transferTime":556.0,"accessStopIndex":1,"egressStopindex":7,"transitRouteId":"19622_002","transitLineId":"SBB_S16_8503016-8503103","departureId":"06153"}</route>
            </leg>
            <activity type="pt interaction" link="955474" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="09:42:00" trav_time="00:00:00">
                <route type="generic" start_link="955474" end_link="955505" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="pt interaction" link="955505" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="09:42:00" trav_time="00:04:00">
                <route type="enriched_pt" start_link="955505" end_link="4223" trav_time="00:04:00" distance="2803.3418395527465">{"inVehicleTime":180.0,"transferTime":60.0,"accessStopIndex":5,"egressStopindex":6,"transitRouteId":"20423_002","transitLineId":"SBB_S24_8502204-8506000","departureId":"06297"}</route>
            </leg>
            <activity type="pt interaction" link="4223" x="2681934.8161827456" y="1247302.7661533705" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="09:46:00" trav_time="00:00:59">
                <route type="generic" start_link="4223" end_link="586407" trav_time="00:00:59" distance="71.92245024668337"></route>
            </leg>
            <activity type="pt interaction" link="586407" x="2681990.0107938214" y="1247298.9705903793" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="09:46:59" trav_time="00:32:00">
                <route type="enriched_pt" start_link="586407" end_link="617712" trav_time="00:32:00" distance="15771.43292404094">{"inVehicleTime":1800.0,"transferTime":120.06462479443144,"accessStopIndex":0,"egressStopindex":19,"transitRouteId":"07698_001","transitLineId":"PAG_line236","departureId":"77274"}</route>
            </leg>
            <activity type="pt interaction" link="617712" x="2679299.97008475" y="1237575.0077440983" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="10:19:00" trav_time="00:03:20">
                <route type="generic" start_link="617712" end_link="270549" trav_time="00:03:20" distance="239.83275324790645"></route>
            </leg>
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" start_time="08:35:56" end_time="11:05:56" >
            </activity>
            <leg mode="car" dep_time="11:05:56" trav_time="00:01:59">
                <route type="links" start_link="270549" end_link="937360" trav_time="00:01:59" distance="1394.670823616876" vehicleRefId="10">270549 498972 449374 449376 449378 938421 79859 80015 937361 937360</route>
            </leg>
            <activity type="leisure" link="937360" facility="440711" x="2680684.0" y="1236911.0" start_time="11:15:56" end_time="11:35:56" >
            </activity>
            <leg mode="car" dep_time="11:35:56" trav_time="00:06:47">
                <route type="links" start_link="937360" end_link="212265" trav_time="00:06:47" distance="5821.641401777163" vehicleRefId="10">937360 80016 79860 938422 449379 449377 449375 498973 270550 617713 617701 270548 270546 449359 784762 784760 784758 79853 982197 212283 79837 212287 212289 212291 878210 878212 878230 784788 212265</route>
            </leg>
            <activity type="other" link="212265" facility="644241" x="2680896.0" y="1238970.0" start_time="11:40:56" end_time="11:50:56" >
            </activity>
            <leg mode="car" dep_time="11:50:56" trav_time="00:09:01">
                <route type="links" start_link="212265" end_link="184658" trav_time="00:09:01" distance="6431.497680911363" vehicleRefId="10">212265 756643 756578 212267 212269 212271 212273 212275 212277 212279 212281 981899 981901 212293 212297 675945 675947 716642 716638 716636 716634 386722 1045288 1045290 385944 385950 773191 773205 773206 1007698 578158 641624 773197 773199 773200 1018216 1012502 184675 184673 184671 184669 184667 184665 184663 184661 45955 184658</route>
            </leg>
            <activity type="leisure" link="184658" facility="18404" x="2681830.0" y="1241204.0" start_time="11:55:56" end_time="13:05:56" >
            </activity>
            <leg mode="car" dep_time="13:05:56" trav_time="00:02:56">
                <route type="links" start_link="184658" end_link="357020" trav_time="00:02:56" distance="907.0458689086348" vehicleRefId="10">184658 529029 184661 45954 1009116 423731 423729 423727 357013 907093 423732 423737 423738 587980 357020</route>
            </leg>
            <activity type="leisure" link="357020" facility="323838" x="2681824.0" y="1241595.0" start_time="13:07:56" end_time="13:45:56" >
            </activity>
            <leg mode="car" dep_time="13:45:56" trav_time="00:07:03">
                <route type="links" start_link="357020" end_link="976840" trav_time="00:07:03" distance="3745.7814646534584" vehicleRefId="10">357020 423733 907093 42924 357022 423720 423722 652640 652678 1010048 51924 51922 1073443 360794 565608 1073441 565592 147081 147079 147077 42264 617017 665104 665106 665108 623565 623508 338910 338912 338914 642624 1025358 1025356 1025354 976836 976838 976840</route>
            </leg>
            <activity type="other" link="976840" facility="14468" x="2682414.0" y="1243360.0" start_time="13:50:56" end_time="13:55:56" >
            </activity>
            <leg mode="car" dep_time="13:55:56" trav_time="00:10:42">
                <route type="links" start_link="976840" end_link="132606" trav_time="00:10:42" distance="4886.406231008111" vehicleRefId="10">976840 976841 976839 886439 987 1025348 591412 549475 549476 549480 923479 575308 575306 550428 195897 704626 3793 630946 587233 587235 704630 704632 846386 846388 8745 846382 145001 145003 126600 960214 960215 960216 960217 960218 960219 350682 350683 350684 643948 233747 204974 204976 204978 204980 204982 1009215 1027276 173350 173346 882895 330036 123134 123132 767425 914177 913332 927720 574585 574588 132606</route>
            </leg>
            <activity type="leisure" link="132606" facility="184091" x="2683108.0" y="1247419.0" start_time="14:00:56" end_time="15:25:56" >
            </activity>
            <leg mode="car" dep_time="15:25:56" trav_time="00:17:42">
                <route type="links" start_link="132606" end_link="708563" trav_time="00:17:42" distance="8713.024897788096" vehicleRefId="10">132606 574714 574716 574718 966200 686943 727974 313219 313214 313216 284586 416285 416308 728198 806135 728212 699596 781183 33938 754077 754079 33942 850507 399738 487552 750280 361926 852850 833253 1058950 690181 1042408 300303 300305 300302 1065844 726013 716741 986695 986690 585066 585067 986044 986043 361851 361853 432617 154439 154440 280951 157171 980758 546148 142674 142676 978664 970119 637731 637723 637725 637727 637729 637709 637711 383679 383561 383557 383673 383677 383571 383573 130947 383553 383555 637715 637717 637719 637721 782897 531973 782895 130865 130867 532023 532025 782893 142398 782903 782901 782899 1018057 771505 771507 213514 637758 224595 224597 224599 224036 224083 708564 708563</route>
            </leg>
            <activity type="other" link="708563" facility="173175" x="2689486.0" y="1246198.0" start_time="15:30:56" end_time="15:35:56" >
            </activity>
            <leg mode="car" dep_time="15:35:56" trav_time="00:03:04">
                <route type="links" start_link="708563" end_link="568677" trav_time="00:03:04" distance="2052.526989938019" vehicleRefId="10">708563 224084 224035 224600 224598 224596 637759 891418 891416 891414 891412 891410 891408 891406 891404 891402 568678 568677</route>
            </leg>
            <activity type="leisure" link="568677" facility="566088" x="2689800.0" y="1247153.0" start_time="15:40:56" end_time="17:05:56" >
            </activity>
            <leg mode="car" dep_time="17:05:56" trav_time="00:42:41">
                <route type="links" start_link="568677" end_link="270549" trav_time="00:42:41" distance="27074.55851803015" vehicleRefId="10">568677 891401 891403 891405 891407 891409 891411 891413 891415 891417 637760 637757 213513 357023 357021 423737 423738 998166 587979 619273 619301 716632 1044671 716633 716635 716637 716643 675948 675946 212298 212294 981902 981900 212282 212280 212278 212276 212274 212272 212270 212268 756579 756644 212266 784789 878231 878213 878211 212292 212290 212288 79838 212284 982198 79854 784757 784759 784761 449358 270545 270547 617700 617712 270549</route>
            </leg>
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" start_time="17:25:56" end_time="21:05:56" >
            </activity>
            <leg mode="walk" dep_time="21:05:56" trav_time="00:03:43">
                <route type="generic" start_link="270549" end_link="617700" trav_time="00:03:43" distance="267.6339299525113"></route>
            </leg>
            <activity type="adpt interaction" link="617700" max_dur="00:00:09" >
            </activity>
            <leg mode="adpt" trav_time="00:15:23">
                <route type="adpt" start_link="617700" end_link="318949" trav_time="undefined" distance="NaN">{"inVehicleTime":923.7909183497832,"endZone":"3251","startZone":"3186"}</route>
            </leg>
            <leg mode="walk" dep_time="21:05:56" trav_time="00:01:46">
                <route type="generic" start_link="318949" end_link="318949" trav_time="00:01:46" distance="127.28480816006235"></route>
            </leg>
            <activity type="leisure" link="318949" facility="345976" x="2684446.0" y="1239884.0" start_time="21:15:56" end_time="24:05:56" >
            </activity>
            <leg mode="walk" dep_time="24:05:56" trav_time="00:01:46">
                <route type="generic" start_link="318949" end_link="318949" trav_time="00:01:46" distance="127.28480816006235"></route>
            </leg>
            <activity type="adpt interaction" link="318949" max_dur="00:00:12" >
            </activity>
            <leg mode="adpt" trav_time="00:14:09">
                <route type="adpt" start_link="318949" end_link="617700" trav_time="undefined" distance="NaN">{"inVehicleTime":849.6589937224489,"endZone":"3186","startZone":"3251"}</route>
            </leg>
            <leg mode="walk" dep_time="24:05:56" trav_time="00:03:43">
                <route type="generic" start_link="617700" end_link="270549" trav_time="00:03:43" distance="267.6339299525113"></route>
            </leg>
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" start_time="24:15:56" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->
<person id="100">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >3</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >false</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >true</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >false</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >324961</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >-1</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000049</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240013385042</attribute>
        </attributes>
        <plan score="0.0" selected="no">
            <activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
            </activity>
        </plan>

        <plan score="0.0" selected="yes">
            <activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="1000">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >48</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >yes</attribute>
            <attribute name="home_x" class="java.lang.Double" >2678966.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1235785.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >137604</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >496052</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000745</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240009138483</attribute>
        </attributes>
        <plan score="-437.00166666666667" selected="yes">
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="05:33:00" >
            </activity>
            <leg mode="transit_walk" dep_time="05:33:00" trav_time="00:00:00">
                <route type="generic" start_link="360294" end_link="360294" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="05:33:00" >
            </activity>
            <leg mode="outside" dep_time="05:33:00" trav_time="00:00:02">
                <route type="generic" start_link="360294" end_link="550470" trav_time="00:00:02" distance="2685.1084863416754"></route>
            </leg>
            <activity type="outside" link="550470" facility="outside_4" x="2676431.2510162788" y="1238710.7365533686" end_time="05:50:00" >
            </activity>
            <leg mode="transit_walk" dep_time="05:50:00" trav_time="00:21:56">
                <route type="generic" start_link="550470" end_link="404241" trav_time="00:21:56" distance="1579.8406905905463"></route>
            </leg>
            <activity type="outside" link="404241" facility="outside_5" x="2677125.839124391" y="1237713.5358423358" end_time="06:14:03" >
            </activity>
            <leg mode="access_walk" dep_time="06:14:03" trav_time="00:06:55">
                <route type="generic" start_link="404241" end_link="270558" trav_time="00:06:55" distance="497.3913068774026"></route>
            </leg>
            <activity type="pt interaction" link="270558" x="2676744.982030722" y="1237676.9668707938" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="06:20:58" trav_time="00:54:02">
                <route type="enriched_pt" start_link="270558" end_link="812194" trav_time="00:54:02" distance="16279.23249322422">{"inVehicleTime":1800.0,"transferTime":1442.0,"accessStopIndex":0,"egressStopindex":8,"transitRouteId":"06763_028","transitLineId":"PAG_line200","departureId":"76577"}</route>
            </leg>
            <activity type="pt interaction" link="812194" x="2682474.0249347393" y="1246541.0148432895" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="07:15:00" trav_time="00:01:01">
                <route type="generic" start_link="812194" end_link="588385" trav_time="00:01:01" distance="73.45759253010056"></route>
            </leg>
            <activity type="pt interaction" link="588385" x="2682500.5564242266" y="1246491.125064118" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="07:16:01" trav_time="00:13:58">
                <route type="enriched_pt" start_link="588385" end_link="368678" trav_time="00:13:58" distance="8378.187255109851">{"inVehicleTime":420.0,"transferTime":418.7853395582497,"accessStopIndex":4,"egressStopindex":5,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05362"}</route>
            </leg>
            <activity type="pt interaction" link="368678" x="2685173.595399507" y="1238953.4179927576" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="07:30:00" trav_time="00:01:10">
                <route type="generic" start_link="368678" end_link="812077" trav_time="00:01:10" distance="82.96796919207021"></route>
            </leg>
            <activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="15:52:43" >
            </activity>
            <leg mode="outside" dep_time="15:52:43" trav_time="00:00:00">
                <route type="generic" start_link="812077" end_link="812077" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="16:59:00" >
            </activity>
            <leg mode="transit_walk" dep_time="16:59:00" trav_time="01:42:47">
                <route type="generic" start_link="812077" end_link="555704" trav_time="01:42:47" distance="7401.037993401233"></route>
            </leg>
            <activity type="outside" link="555704" facility="outside_7" x="2690699.2533230074" y="1240302.4760125757" end_time="17:07:39" >
            </activity>
            <leg mode="access_walk" dep_time="17:07:39" trav_time="00:33:33">
                <route type="generic" start_link="555704" end_link="348266" trav_time="00:33:33" distance="2415.2684761259893"></route>
            </leg>
            <activity type="pt interaction" link="348266" x="2688841.9870530544" y="1240253.9986282045" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="17:41:12" trav_time="00:10:48">
                <route type="enriched_pt" start_link="348266" end_link="166875" trav_time="00:10:48" distance="3166.770768054601">{"inVehicleTime":420.0,"transferTime":228.0,"accessStopIndex":0,"egressStopindex":10,"transitRouteId":"02828_023","transitLineId":"VZO_line961","departureId":"125106"}</route>
            </leg>
            <activity type="pt interaction" link="166875" x="2687161.005729228" y="1240076.9559941967" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="17:52:00" trav_time="00:00:21">
                <route type="generic" start_link="166875" end_link="771010" trav_time="00:00:21" distance="25.959922652207396"></route>
            </leg>
            <activity type="pt interaction" link="771010" x="2687180.6471416447" y="1240073.3528400902" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="17:52:21" trav_time="00:19:38">
                <route type="enriched_pt" start_link="771010" end_link="955474" trav_time="00:19:38" distance="9742.201043728513">{"inVehicleTime":960.0,"transferTime":218.36673112316203,"accessStopIndex":1,"egressStopindex":7,"transitRouteId":"19622_002","transitLineId":"SBB_S16_8503016-8503103","departureId":"06187"}</route>
            </leg>
            <activity type="pt interaction" link="955474" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="18:12:00" trav_time="00:00:00">
                <route type="generic" start_link="955474" end_link="955504" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="pt interaction" link="955504" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="18:12:00" trav_time="00:07:00">
                <route type="enriched_pt" start_link="955504" end_link="4223" trav_time="00:07:00" distance="3304.5168456795577">{"inVehicleTime":120.0,"transferTime":300.0,"accessStopIndex":2,"egressStopindex":3,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05406"}</route>
            </leg>
            <activity type="pt interaction" link="4223" x="2681934.8161827456" y="1247302.7661533705" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="18:19:00" trav_time="00:00:59">
                <route type="generic" start_link="4223" end_link="586407" trav_time="00:00:59" distance="71.92245024668337"></route>
            </leg>
            <activity type="pt interaction" link="586407" x="2681990.0107938214" y="1247298.9705903793" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="18:19:59" trav_time="01:01:00">
                <route type="enriched_pt" start_link="586407" end_link="617712" trav_time="01:01:00" distance="15771.43292404094">{"inVehicleTime":1920.0,"transferTime":1740.0646247944242,"accessStopIndex":0,"egressStopindex":19,"transitRouteId":"07744_004","transitLineId":"PAG_line236","departureId":"77196"}</route>
            </leg>
            <activity type="pt interaction" link="617712" x="2679299.97008475" y="1237575.0077440983" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="19:21:00" trav_time="00:15:42">
                <route type="generic" start_link="617712" end_link="360294" trav_time="00:15:42" distance="1130.0689845763227"></route>
            </leg>
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="17:53:00" >
            </activity>
        </plan>

    </person>
</population>

I want to extract from all the person's, from their planwhich has selected="yes" and their leg has mode="adpt" the content in the {} at the node route.

To make it more clear; I want to access and extract the endZone value and the startZone value.

<leg mode="adpt" trav_time="00:13:51">
                <route type="adpt" start_link="318949" end_link="617700" trav_time="undefined" distance="NaN">{"inVehicleTime":831.4581089178682,"endZone":"3186","startZone":"3251"}</route>
            </leg>

In essence, I want to create a data.frame with two columns that display these two zones, from all the plans, which have been selected from all people.

startZone   endZone
3186        3251
...         ...

My approach, which gives me no output:

import gzip
import xml.etree.cElementTree as ET
import pandas as pd
from collections import defaultdict


tree = ET.iterparse(gzip.open('file.xml.gz', 'r'))
zones = defaultdict(dict)
for xml_event, elem in tree:
    attributes = elem.attrib
    if elem.tag == 'plan' \
    and elem.attrib["selected"] == "yes" :
        if elem.tag == "leg"\
        and elem.attrib["mode"] == "adpt":
            if elem.tag == "route":
                zones[attributes["startZone", "endZone"]] #here I'm clueless about, what to do. 
    elem.clear()  

zones= pd.DataFrame.from_dict(links_used, orient='index')

My struggle is really to access this part of an XML. Thank you very much in advance!

UPDATE

data = gzip.open(file, 'r')

root = ET.parse(data).getroot()

from collections import defaultdict
d = defaultdict(list)

for ent in root.findall('./person/plan[@selected="yes"]/leg[@mode="pt"]'):
    for anoda in ent.findall('route'):
        d['start_link'].append(anoda.text)
coords=pd.DataFrame(d)
coords
0   {"inVehicleTime":120.0,"transferTime":21.0,"ac...
1   {"inVehicleTime":1260.0,"transferTime":185.0,"...
2   {"inVehicleTime":420.0,"transferTime":114.4987...
3   {"inVehicleTime":420.0,"transferTime":135.0,"a...
4   {"inVehicleTime":1140.0,"transferTime":54.4987...
... ...

This did the trick for me to at least access the data. The cleanup will have to be done in a next step.


回答1:


do a search on ur xml all the way to 'route', create ur dictionary of start and end zones, and finally ur dataframe:

import ast
from collections import defaultdict

#u can replace this with root = ET.parse('data.xml').getroot()
root = ET.fromstring(data)
d = defaultdict(list)

for ent in root.findall('./person/plan[@selected="yes"]/leg[@mode="adpt"]/route'):  
    #the data u r interested in is accessed via the 'text' attribute
    #ast.literal_eval allows us to evaluate the string as a python container,
    #in this case, a dictionary
    dicts = ast.literal_eval(ent.text)
    d['endZone'].append(dicts['endZone'])
    d['startZone'].append(dicts['startZone'])

print(d)
defaultdict(list, {'endZone': ['3251', '3186'], 'startZone': ['3186', '3251']})

pd.DataFrame(d)

    endZone startZone
0   3251    3186
1   3186    3251


来源:https://stackoverflow.com/questions/61484876/how-to-extract-information-after-a-node-in-xml-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!