Replace strings in a list (using re.sub)

后端未结

关注

 5  555

I am trying to replace parts of file extensions in a list of files. I would like to be able to loop through items (files), and remove the extensions. I don\'t know how to ap

相关标签:

5条回答

故里飘歌

2020-12-06 08:59
I prefer to python internal functions rather than importing and using a library if possible. Using regex for such simple task might not be the best way to do it. This approach looks clean.

Try this
```
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
file_lst_trimmed =[]
for file in file_lst:
    file_lst_trimmed.append(file.split('.')[0][:-1])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-06 08:59
Your loop is actually perfectly fine! There are two other issues.
1. You're setting file_lst_trimmed equal to your string every iteration of the loop. You want to use append as in file_lst_trimmed.append("apple").
2. Your regular expression is '1.fa' when it should really just be '.fa' (assuming you only want to strip .fa extensions).
EDIT: I now see that you also want to remove the last number. In that case, you'll want '\d+\.fa' (\d is a stand-in for any digit 0-9, and \d+ means a string of digits of any length -- so this will remove 10, 11, 13254, etc. The \ before the . is because . is a special character that needs to be escaped.) If you want to remove arbitrary file extensions, you'll want to put \w+ instead of fa -- a string of letters of any length. You might want to check out the documentation for regex.
0 讨论(0)
发布评论:

提交评论
- 加载中...

被撕碎了的回忆

2020-12-06 09:06

You can try this:

import re
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
final_list = [re.sub('\d+\.\w+$', '', i) for i in file_lst]

Output:

['cats', 'cats', 'dog', 'dog']

0 讨论(0)

有刺的猬

2020-12-06 09:11
You can use a list comprehension to construct the new list with the cleaned up files names. \d is the regex to match a single character and $ only matches at the end of the string.
```
file_lst_trimmed = [re.sub(r'\d\.fa$', '', file) for file in file_lst]
```
The results:
```
>>> file_lst_trimmed 
['cats', 'cats', 'dog', 'dog']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-06 09:11
No need for regex, use the standard library os and os.path.splittext for this.

Split the pathname path into a pair (root, ext) such that root + ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').
```
import os.path

l = ['hello.fa', 'images/hello.png']

[os.path.splitext(filename)[0] for filename in l]
```
Returns
```
['hello', 'images/hello']
```
0 讨论(0)
发布评论:

提交评论
- 加载中...