Extract string from HTML String

后端 未结 5 679
名媛妹妹
名媛妹妹 2021-01-26 04:26

i want to extract a number from a html string (i usually do not know the number).

The crucial part looks like this:



        
相关标签:
5条回答
  • 2021-01-26 05:02

    in your view.py document you can try this:

    import re
    my_string="TOTAL : 286"
    int(re.search(r'\d+', my_string).group())
    

    286

    0 讨论(0)
  • 2021-01-26 05:06

    You can use string partitioning to extract a "number" string from the whole HTML string like this (assuming HTML code is in html_string variable):

    num_string=html_string.partition("TOTAL:")[2].partition("<")[0]

    there you get num_string with the number as a string, then simply convert it to an integer or whatever you want. Keep in mind that this will process the first occurence of anything that looks like "TOTAL: anything_goes_here <", so you want to make sure that this pattern is unique.

    0 讨论(0)
  • 2021-01-26 05:17

    If the string "TOTAL : number" is unique then use a regular expression to first search this substring and then extract the number from it.

    import re
    
    string = 'test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>'
    
    reg__expr = r'TOTAL\s:\s\d+'  # TOTAL<whitespace>:<whitespace><number>
    # find the substring
    result = re.findall(reg__expr, string)
    if result:
    
       substring = result[0]
    
       reg__expr = r'\d+'  # <number>
       result = re.findall(reg__expr, substring)
       number = int(result[0])
    
       print(number)
    

    You can test your own regular expressions here https://regex101.com/

    0 讨论(0)
  • 2021-01-26 05:21

    You can try the following like this below:

        line = "TOTAL : 286"
        if line.startswith('TOTAL : '):
            print(line[8:len(line)])
    

    Output :

        286
    
    0 讨论(0)
  • 2021-01-26 05:29

    If your HTML String is this:

    html_string = """<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>
    <tagend>"""
    

    Try this:

    int(html_string.split("</test>")[0].split(":")[-1].replace(" ", ""))
    
    0 讨论(0)
提交回复
热议问题