How to get specific values from a xml file into csv file using python?

问题

I am trying to extract object, xmin, ymin, xmax and xmax value of every object tag there is.

XML

<annotation>
    <folder>Plates_Number</folder>
    <filename>1.png</filename>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>294</width>
        <height>60</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>2</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>40</xmin>
            <ymin>1</ymin>
            <xmax>69</xmax>
            <ymax>42</ymax>
        </bndbox>
    </object>
    <object>
        <name>10</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>67</xmin>
            <ymin>3</ymin>
            <xmax>101</xmax>
            <ymax>43</ymax>
        </bndbox>
    </object>
    <object>
        <name>1</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>122</xmin>
            <ymin>2</ymin>
            <xmax>153</xmax>
            <ymax>45</ymax>
        </bndbox>
    </object>
    <object>
        <name>10</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>151</xmin>
            <ymin>3</ymin>
            <xmax>183</xmax>
            <ymax>44</ymax>
        </bndbox>
    </object>
    <object>
        <name>2</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>186</xmin>
            <ymin>4</ymin>
            <xmax>216</xmax>
            <ymax>47</ymax>
        </bndbox>
    </object>
    <object>
        <name>5</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>214</xmin>
            <ymin>5</ymin>
            <xmax>245</xmax>
            <ymax>46</ymax>
        </bndbox>
    </object>
</annotation>

This is what I tried but didn't get the expected result

python

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("1.xml")
root = tree.getroot()

# open a file for writing

data = open('test.csv', 'r+')

# create the csv writer object

csvwriter = csv.writer(data)
data_head = []

count = 0
for member in root.findall('object'):
    obj = []
    bndbox_list = []
    if count == 0:
        name = member.find('name').tag
        data_head.append(name)
        bndbox = member[4].tag
        data_head.append(bndbox)
        csvwriter.writerow(data_head)
        count = count + 1

    name = member.find('name').text
    obj.append(name)
    bndbox = member[4][0].text
    bndbox_list.append(bndbox)
    xmin = member[4][1].text
    bndbox_list.append(xmin)
    ymin = member[4][2].text
    bndbox_list.append(ymin)
    xmax = member[4][3].text
    bndbox_list.append(xmax)
    ymax = member[4][4].text
    bndbox_list.append(ymax)
    obj.append(bndbox)
    csvwriter.writerow(data)
data.close()

I expect Name xmin ymin xmax ymax 2 40 1 69 42 10 67 3 101 43 1 122 2 153 45 10 151 3 183 44 2 186 4 216 47 5 214 5 245 46

but I am only getting these two header

Name bndbox

and no value

回答1:

If you can use BeautifulSoup, you could use

from bs4 import BeautifulSoup
soup = BeautifulSoup(input_xml_string)
tgs = soup.find_all('object', 'xml')
l = [(i.find('name').string, i.xmin.string, i.ymin.string, i.xmax.string, i.ymax.string) for i in tgs]

where input_xml_string is the input xml in string form.

soup would be a BeautifulSoup object which is a representation of the xml tree.

An xml parser is used.

Then the find_all() function is used to find all the <object> tags in the xml. The result is stored in tgs.

Now from the elements in tgs, which would be children tags of <object>, we select the tags we need, which are Tag objects, and get their values using their string attribute.

We could have accessed the value in name using its string attribute but name is the name of an attribute of the Tag class. So we first used find() to get the <name> child of <object> and then we got its content.

Now if we print the values in l,

for i in l:
    print(i)

we would get,

('2', '40', '1', '69', '42')
('10', '67', '3', '101', '43')
('1', '122', '2', '153', '45')
('10', '151', '3', '183', '44')
('2', '186', '4', '216', '47')
('5', '214', '5', '245', '46')

回答2:

code :

import xml.etree.ElementTree as ET

root = ET.parse('file.xml').getroot()


for type_tag in root.findall('object'):
    name = type_tag.find('name').text
    xmin = type_tag.find('bndbox/xmin').text
    ymin = type_tag.find('bndbox/ymin').text
    xmax = type_tag.find('bndbox/xmax').text
    ymax = type_tag.find('bndbox/ymax').text

    print([name,xmin,ymin,xmax,ymax])

output:

['2', '40', '1', '69', '42']
['10', '67', '3', '101', '43']
['1', '122', '2', '153', '45']
['10', '151', '3', '183', '44']
['2', '186', '4', '216', '47']
['5', '214', '5', '245', '46']

来源：https://stackoverflow.com/questions/56020248/how-to-get-specific-values-from-a-xml-file-into-csv-file-using-python

标签

python

xml-parsing