parse xml to pandas data frame in python

前端 未结 2 876
北恋
北恋 2021-01-21 00:27

I am trying to read the XML file and convert it to pandas. However it returns empty data

This is the sample of xml structure:


         


        
2条回答
  •  悲哀的现实
    2021-01-21 00:44

    Several issues:

    • Calling .find on the loop variable, node, expects a child node to exist: current_node.find('child_of_current_node'). However, since all the nodes are the children of root they do not maintain their own children, so no loop is required;
    • Not checking NoneType that can result from missing nodes with find() and prevents retrieving .tag or .text or other attributes;
    • Not retrieving node content with .text, otherwise the object is returned;

    Consider this adjustment using the ternary condition expression a if condition else b to ensure variable has a value regardless:

    rows = []
    
    s_name = xroot.attrib.get("ID")
    s_student = xroot.find("StudentID").text if xroot.find("StudentID") is not None else None
    s_task = xroot.find("TaskID").text if xroot.find("TaskID") is not None else None      
    s_source = xroot.find("DataSource").text if xroot.find("DataSource") is not None else None
    s_desc = xroot.find("ProblemDescription").text if xroot.find("ProblemDescription") is not None else None
    s_question = xroot.find("Question").text if xroot.find("Question") is not None else None    
    s_ans = xroot.find("Answer").text if xroot.find("Answer") is not None else None
    s_label = xroot.find("Label").text if xroot.find("Label") is not None else None
    s_contextrequired = xroot.find("ContextRequired").text if xroot.find("ContextRequired") is not None else None
    s_extraInfoinAnswer = xroot.find("ExtraInfoInAnswer").text if xroot.find("ExtraInfoInAnswer") is not None else None
    s_comments = xroot.find("Comments").text if xroot.find("Comments") is not None else None
    s_watch = xroot.find("Watch").text if xroot.find("Watch") is not None else None
    s_referenceAnswers = xroot.find("ReferenceAnswers").text if xroot.find("ReferenceAnswers") is not None else None
    
    rows.append({"ID": s_name,"StudentID":s_student, "TaskID": s_task, 
                 "DataSource": s_source, "ProblemDescription": s_desc , 
                 "Question": s_question , "Answer": s_ans ,"Label": s_label,
                 "s_contextrequired": s_contextrequired , "ExtraInfoInAnswer": s_extraInfoinAnswer ,
                 "Comments": s_comments ,  "Watch": s_watch, "ReferenceAnswers": s_referenceAnswers     
                })
    
    out_df = pd.DataFrame(rows, columns = df_cols)
    

    Alternatively, run a more dynamic version assigning to an inner dictionary using the iterator variable:

    rows = []
    for node in xroot: 
        inner = {}
        inner[node.tag] = node.text
    
        rows.append(inner)
    
    out_df = pd.DataFrame(rows, columns = df_cols)
    

    Or list/dict comprehension:

    rows = [{node.tag: node.text} for node in xroot]
    out_df = pd.DataFrame(rows, columns = df_cols)
    

提交回复
热议问题