Stripping everything but alphanumeric chars from a string in Python

前端 未结 11 1293
不思量自难忘°
不思量自难忘° 2020-11-22 10:52

What is the best way to strip all non alphanumeric characters from a string, using Python?

The solutions presented in the PHP variant of this question will probably

相关标签:
11条回答
  • 2020-11-22 11:23
    sent = "".join(e for e in sent if e.isalpha())
    
    0 讨论(0)
  • 2020-11-22 11:25

    You could try:

    print ''.join(ch for ch in some_string if ch.isalnum())
    
    0 讨论(0)
  • 2020-11-22 11:26

    How about:

    def ExtractAlphanumeric(InputString):
        from string import ascii_letters, digits
        return "".join([ch for ch in InputString if ch in (ascii_letters + digits)])
    

    This works by using list comprehension to produce a list of the characters in InputString if they are present in the combined ascii_letters and digits strings. It then joins the list together into a string.

    0 讨论(0)
  • 2020-11-22 11:26

    If i understood correctly the easiest way is to use regular expression as it provides you lots of flexibility but the other simple method is to use for loop following is the code with example I also counted the occurrence of word and stored in dictionary..

    s = """An... essay is, generally, a piece of writing that gives the author's own 
    argument — but the definition is vague, 
    overlapping with those of a paper, an article, a pamphlet, and a short story. Essays 
    have traditionally been 
    sub-classified as formal and informal. Formal essays are characterized by "serious 
    purpose, dignity, logical 
    organization, length," whereas the informal essay is characterized by "the personal 
    element (self-revelation, 
    individual tastes and experiences, confidential manner), humor, graceful style, 
    rambling structure, unconventionality 
    or novelty of theme," etc.[1]"""
    
    d = {}      # creating empty dic      
    words = s.split() # spliting string and stroing in list
    for word in words:
        new_word = ''
        for c in word:
            if c.isalnum(): # checking if indiviual chr is alphanumeric or not
                new_word = new_word + c
        print(new_word, end=' ')
        # if new_word not in d:
        #     d[new_word] = 1
        # else:
        #     d[new_word] = d[new_word] +1
    print(d)
    

    please rate this if this answer is useful!

    0 讨论(0)
  • 2020-11-22 11:27

    I just timed some functions out of curiosity. In these tests I'm removing non-alphanumeric characters from the string string.printable (part of the built-in string module). The use of compiled '[\W_]+' and pattern.sub('', str) was found to be fastest.

    $ python -m timeit -s \
         "import string" \
         "''.join(ch for ch in string.printable if ch.isalnum())" 
    10000 loops, best of 3: 57.6 usec per loop
    
    $ python -m timeit -s \
        "import string" \
        "filter(str.isalnum, string.printable)"                 
    10000 loops, best of 3: 37.9 usec per loop
    
    $ python -m timeit -s \
        "import re, string" \
        "re.sub('[\W_]', '', string.printable)"
    10000 loops, best of 3: 27.5 usec per loop
    
    $ python -m timeit -s \
        "import re, string" \
        "re.sub('[\W_]+', '', string.printable)"                
    100000 loops, best of 3: 15 usec per loop
    
    $ python -m timeit -s \
        "import re, string; pattern = re.compile('[\W_]+')" \
        "pattern.sub('', string.printable)" 
    100000 loops, best of 3: 11.2 usec per loop
    
    0 讨论(0)
  • 2020-11-22 11:27
    for char in my_string:
        if not char.isalnum():
            my_string = my_string.replace(char,"")
    
    0 讨论(0)
提交回复
热议问题