Python BeautifulSoup Getting a column from table - IndexError List index out of range

限于喜欢 提交于 2019-12-02 07:00:27

问题


Python newbie here. Python 2.7 with beautifulsoup 4.

I am trying to get parse a webpage to get columns using BeautifulSoup. The webpage has tables inside tables; but table 4 is the one that I want, it does not have any headers or th tag. I want to get the data into column.

from bs4 import BeautifulSoup
import urllib2

url = 'http://finance.yahoo.com/q/op?s=aapl+Options'
htmltext = urllib2.urlopen(url).read()
soup = BeautifulSoup(htmltext)

#Table 8 has the data needed; it is nested under other tables though
# specific reference works as below:
print soup.findAll('table')[8].findAll('tr')[2].findAll('td')[2].contents

# Below loop erros out:
for row in soup.findAll('table')[8].findAll('tr'):
    column2 = row.findAll('td')[2].contents
    print column2

# "Index error: list index out of range" is what I get on second line of for loop.

I saw this as a working solution in another example but didnt work for me. Also tried iterating around tr:

mytr = soup.findAll('table')[8].findAll('tr')

for row in mytr:
    print row.find('td') #works but gives only first td as expected
    print row.findAll('td')[2]

which gives an error that row is a list which is out of index.

So:

  1. First findAll('table') - works
  2. second findAll('tr') - works
  3. third findAll('td') - works only if ALL [ ] are numbers and not variables.

e.g.

print soup.findAll('table')[8].findAll('tr')[2].findAll('td')[2].contents

Above works as it is specific reference but not through variables. But I need it inside a loop to get full column.


回答1:


I took a look, first row in the table is actually a header so under the first tr there are some th, this should work:

>>> mytr = soup.findAll('table')[9].findAll('tr')
>>> for i,row in enumerate(mytr):
...     if i:
...         print i,row.findAll('td')[2]

as in most cases of html parsing, consider a more elegant solution like xml and xpath, like:

>>> from lxml import html
>>> print html.parse(url).xpath('//table[@class="yfnc_datamodoutline1"]//td[2]')


来源:https://stackoverflow.com/questions/21494085/python-beautifulsoup-getting-a-column-from-table-indexerror-list-index-out-of

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!