Convert HTML table with a header to Json - Python

后端 未结 2 1572
日久生厌
日久生厌 2021-01-22 09:31

Suppose I have the following HTML table:

相关标签:
2条回答
  • 2021-01-22 09:47

    You can use soup.find_all:

    from bs4 import BeautifulSoup as soup
    s = soup(html, 'html.parser').table
    h, [_, *d] = [i.text for i in s.tr.find_all('th')], [[i.text for i in b.find_all('td')] for b in s.find_all('tr')]
    result = [dict(zip(h, i)) for i in d]
    

    Output:

    [{'Name': 'John', 'Age': '28', 'License': 'Y', 'Amount': '12.30'}, {'Name': 'Kevin', 'Age': '25', 'License': 'Y', 'Amount': '22.30'}, {'Name': 'Smith', 'Age': '38', 'License': 'Y', 'Amount': '52.20'}, {'Name': 'Stewart', 'Age': '21', 'License': 'N', 'Amount': '3.80'}]
    
    0 讨论(0)
  • 2021-01-22 10:12

    This code does exactly what you want

    from bs4 import BeautifulSoup
    import json
    
    xml_data = """
    [[your xml data]]"""
    
    
    if __name__ == '__main__':
        model = BeautifulSoup(xml_data, features='lxml')
        fields = []
        table_data = []
        for tr in model.table.find_all('tr', recursive=False):
            for th in tr.find_all('th', recursive=False):
                fields.append(th.text)
        for tr in model.table.find_all('tr', recursive=False):
            datum = {}
            for i, td in enumerate(tr.find_all('td', recursive=False)):
                datum[fields[i]] = td.text
            if datum:
                table_data.append(datum)
    
        print(json.dumps(table_data, indent=4))
    
    0 讨论(0)
提交回复
热议问题
Name Age License