Replace strings in a list (using re.sub)

后端 未结 5 555
萌比男神i
萌比男神i 2020-12-06 08:30

I am trying to replace parts of file extensions in a list of files. I would like to be able to loop through items (files), and remove the extensions. I don\'t know how to ap

相关标签:
5条回答
  • 2020-12-06 08:59

    I prefer to python internal functions rather than importing and using a library if possible. Using regex for such simple task might not be the best way to do it. This approach looks clean.

    Try this

    file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
    file_lst_trimmed =[]
    for file in file_lst:
        file_lst_trimmed.append(file.split('.')[0][:-1])
    
    0 讨论(0)
  • 2020-12-06 08:59

    Your loop is actually perfectly fine! There are two other issues.

    1. You're setting file_lst_trimmed equal to your string every iteration of the loop. You want to use append as in file_lst_trimmed.append("apple").

    2. Your regular expression is '1.fa' when it should really just be '.fa' (assuming you only want to strip .fa extensions).

    EDIT: I now see that you also want to remove the last number. In that case, you'll want '\d+\.fa' (\d is a stand-in for any digit 0-9, and \d+ means a string of digits of any length -- so this will remove 10, 11, 13254, etc. The \ before the . is because . is a special character that needs to be escaped.) If you want to remove arbitrary file extensions, you'll want to put \w+ instead of fa -- a string of letters of any length. You might want to check out the documentation for regex.

    0 讨论(0)
  • 2020-12-06 09:06

    You can try this:

    import re
    file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
    final_list = [re.sub('\d+\.\w+$', '', i) for i in file_lst]
    

    Output:

    ['cats', 'cats', 'dog', 'dog']
    
    0 讨论(0)
  • 2020-12-06 09:11

    You can use a list comprehension to construct the new list with the cleaned up files names. \d is the regex to match a single character and $ only matches at the end of the string.

    file_lst_trimmed = [re.sub(r'\d\.fa$', '', file) for file in file_lst]
    

    The results:

    >>> file_lst_trimmed 
    ['cats', 'cats', 'dog', 'dog']
    
    0 讨论(0)
  • 2020-12-06 09:11

    No need for regex, use the standard library os and os.path.splittext for this.

    Split the pathname path into a pair (root, ext) such that root + ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').

    import os.path
    
    l = ['hello.fa', 'images/hello.png']
    
    [os.path.splitext(filename)[0] for filename in l]
    

    Returns

    ['hello', 'images/hello']
    
    0 讨论(0)
提交回复
热议问题