Extracting an attribute value with beautifulsoup

前端 未结 9 1919
故里飘歌
故里飘歌 2020-11-22 04:38

I am trying to extract the content of a single \"value\" attribute in a specific \"input\" tag on a webpage. I use the following code:

import urllib
f = urll         


        
相关标签:
9条回答
  • 2020-11-22 05:20

    You can try gazpacho:

    Install it using pip install gazpacho

    Get the HTML and make the Soup using:

    from gazpacho import get, Soup
    
    soup = Soup(get("http://ip.add.ress.here/"))  # get directly returns the html
    
    inputs = soup.find('input', attrs={'name': 'stainfo'})  # Find all the input tags
    
    if inputs:
        if type(inputs) is list:
            for input in inputs:
                 print(input.attr.get('value'))
        else:
             print(inputs.attr.get('value'))
    else:
         print('No <input> tag found with the attribute name="stainfo")
    
    0 讨论(0)
  • 2020-11-22 05:22

    .find_all() returns list of all found elements, so:

    input_tag = soup.find_all(attrs={"name" : "stainfo"})
    

    input_tag is a list (probably containing only one element). Depending on what you want exactly you either should do:

    output = input_tag[0]['value']
    

    or use .find() method which returns only one (first) found element:

    input_tag = soup.find(attrs={"name": "stainfo"})
    output = input_tag['value']
    
    0 讨论(0)
  • 2020-11-22 05:26

    I am using this with Beautifulsoup 4.8.1 to get the value of all class attributes of certain elements:

    from bs4 import BeautifulSoup
    
    html = "<td class='val1'/><td col='1'/><td class='val2' />"
    
    bsoup = BeautifulSoup(html, 'html.parser')
    
    for td in bsoup.find_all('td'):
        if td.has_attr('class'):
            print(td['class'][0])
    

    Its important to note that the attribute key retrieves a list even when the attribute has only a single value.

    0 讨论(0)
  • 2020-11-22 05:31

    If you want to retrieve multiple values of attributes from the source above, you can use findAll and a list comprehension to get everything you need:

    import urllib
    f = urllib.urlopen("http://58.68.130.147")
    s = f.read()
    f.close()
    
    from BeautifulSoup import BeautifulStoneSoup
    soup = BeautifulStoneSoup(s)
    
    inputTags = soup.findAll(attrs={"name" : "stainfo"})
    ### You may be able to do findAll("input", attrs={"name" : "stainfo"})
    
    output = [x["stainfo"] for x in inputTags]
    
    print output
    ### This will print a list of the values.
    
    0 讨论(0)
  • 2020-11-22 05:31

    You could try to use the new powerful package called requests_html:

    from requests_html import HTMLSession
    session = HTMLSession()
    
    r = session.get("https://www.bbc.co.uk/news/technology-54448223")
    date = r.html.find('time', first = True) # finding a "tag" called "time"
    print(date)  # you will have: <Element 'time' datetime='2020-10-07T11:41:22.000Z'>
    # To get the text inside the "datetime" attribute use:
    print(date.attrs['datetime']) # you will get '2020-10-07T11:41:22.000Z'
    
    0 讨论(0)
  • 2020-11-22 05:35

    I would actually suggest you a time saving way to go with this assuming that you know what kind of tags have those attributes.

    suppose say a tag xyz has that attritube named "staininfo"..

    full_tag = soup.findAll("xyz")
    

    And i wan't you to understand that full_tag is a list

    for each_tag in full_tag:
        staininfo_attrb_value = each_tag["staininfo"]
        print staininfo_attrb_value
    

    Thus you can get all the attrb values of staininfo for all the tags xyz

    0 讨论(0)
提交回复
热议问题