Normalize whitespace with Python

前端 未结 3 1414
野趣味
野趣味 2021-01-12 11:57

I\'m building a data extract using scrapy and want to normalize a raw string pulled out of an HTML document. Here\'s an example string:

  Sapphire RX460 OC           


        
相关标签:
3条回答
  • 2021-01-12 12:16

    You can use a function like below with regular expression to scan for continuous spaces and replace them by 1 space

    import re
    
    def clean_data(data):
        return re.sub(" {2,}", " ", data.strip())
    
    product_title = clean(product.css('h3::text').extract_first())
    

    And then improve clean function anyway you like it

    0 讨论(0)
  • 2021-01-12 12:21

    You can use:

    " ".join(s.split())
    

    where s is your string.

    0 讨论(0)
  • 2021-01-12 12:25

    Instead of using regex's for this, a more efficient solution is to use the join/split option, observe:

    >>> timeit.Timer((lambda:' '.join(' Sapphire RX460 OC  2/4GB'.split()))).timeit()
    0.7263979911804199
    
    >>> def f():
            return re.sub(" +", ' ', "  Sapphire RX460 OC  2/4GB").split()
    
    >>> timeit.Timer(f).timeit()
    4.163465976715088
    
    0 讨论(0)
提交回复
热议问题