I\'m doing a script to save a read path in sublist. Let\'s suppose I have 400 file paths saved in a list, every path has the specific syntax Ci_whaterver.csv
, then
One way will be to create a dictionary and use the part Ci
as key and lists of file names starting with Ci
will be the value. For example, take pathlist = ['C1_01.csv','C1_02.csv', 'C2_01.csv' , 'C3_01.csv', 'C2_02.csv']
, then we will create a dictionary which will store
{'C1': ['C1_01.csv', 'C1_02.csv'], 'C2': ['C2_01.csv', 'C2_02.csv'], 'C3': ['C3_01.csv']}
Here is the code:
pathlist = ['C1_01.csv','C1_02.csv', 'C2_01.csv' , 'C3_01.csv', 'C2_02.csv']
d = {}
for path in pathlist:
if path[:2] not in d:
d[path[:2]] = [path]
else:
d[path[:2]].append(path)
pathlistf = []
for key in d:
pathlistf.append(d[key])
print(pathlistf)
# Output: [['C1_01.csv', 'C1_02.csv'], ['C3_01.csv'], ['C2_01.csv', 'C2_02.csv']]
Hope this solves the problem. Feel free to ask any doubt.
If pathlist
is pre-sorted, you can use the following code based on the itertools.groupby.
from itertools import groupby
pathlist=['Cn_01.csv', 'C1_02.csv', 'C9_01.csv', 'C9_02.csv', 'Ca_01.csv', 'C9_03.csv', 'Ca_02.csv', 'C1_01.csv', 'Cn_02.csv']
pathlist.sort()
groupedfilenames = (list(g) for _, g in groupby(pathlist, key=lambda a: a[:2]))
print(list(groupedfilenames))
Output:
[['C1_01.csv', 'C1_02.csv'], ['C9_01.csv', 'C9_02.csv', 'C9_03.csv'], ['Ca_01.csv', 'Ca_02.csv'], ['Cn_01.csv', 'Cn_02.csv']]