Decode HTML entities in Python string?

后端 未结 6 901
名媛妹妹
名媛妹妹 2020-11-21 06:18

I\'m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn\'t automatically decode for me:

>>> from Be         


        
6条回答
  •  离开以前
    2020-11-21 06:53

    I had a similar encoding issue. I used the normalize() method. I was getting a Unicode error using the pandas .to_html() method when exporting my data frame to an .html file in another directory. I ended up doing this and it worked...

        import unicodedata 
    

    The dataframe object can be whatever you like, let's call it table...

        table = pd.DataFrame(data,columns=['Name','Team','OVR / POT'])
        table.index+= 1
    

    encode table data so that we can export it to out .html file in templates folder(this can be whatever location you wish :))

         #this is where the magic happens
         html_data=unicodedata.normalize('NFKD',table.to_html()).encode('ascii','ignore')
    

    export normalized string to html file

        file = open("templates/home.html","w") 
    
        file.write(html_data) 
    
        file.close() 
    

    Reference: unicodedata documentation

提交回复
热议问题