python, lxml and xpath - html table parsing

后端 未结 2 1460
抹茶落季
抹茶落季 2021-02-06 13:19

I \'am new to lxml, quite new to python and could not find a solution to the following:

I need to import a few tables with 3 columns and an undefined number of rows star

2条回答
  •  后悔当初
    2021-02-06 13:54

    This is a generator:

    def process_row(row):  
         for cell in row.xpath('./td'):  
             print cell.text_content()  
             yield cell.text_content() 
    

    You're calling it as though you thought it returns a list. It doesn't. There are contexts in which it behaves like a list:

    print [r for r in process_row(row)]
    

    but that's only because a generator and a list both expose the same interface to for loops. Using it in a context where it gets evaluated just one time, e.g.:

    return [process_row(row) for row in table.xpath('./tr')]
    

    just calls a new instance of the generator once for each new value of row, returning the first result yielded.

    So that's your first problem. Your second one is that you're expecting:

    tbl = doc.xpath("//body/table[2]//tr[position()>2]")[0]
    

    to give you the third and all subsequent rows, and it's only setting tbl to the third row. Well, the call to xpath is returning the third and all subsequent rows. It's the [0] at the end that's messing you up.

提交回复
热议问题