Python - sorting strings numerically

风格不统一 提交于 2021-02-05 10:44:07

问题


I'm using python to merge two files together to create a new one, the data in both files have an id at the start of every string which I want to sort so they're both in the same order and can be merged. To do this I've used .sort() so that they're both arranged in the same order and the comments match the details. However, I'd now like to reorder them so that they go 1, 2, 3, 4... instead of 1, 10, 100, 1000, 1001, 1002 etc but I am having difficulties since the number is the start of a string and python wont convert the first four characters of a string to an integer. If it is any help it is also a tab delimited file and the next piece of information after the id is the date.

Any ideas would be appreciated and ideally I would not like to import any libraries.

My code is:

comments = R'C:\Pythonfile\UFOGB_Comments.txt'
details = R'C:\Pythonfile\UFOGB_Details.txt'
mydest = R'C:\Pythonfile\UFOGB_sorted.txt'

with open(details,'rt') as src:
    readdetails = src.readlines()
    readdetails.sort()

with open(comments,'rt') as src:
    readcomments = src.readlines()
    readcomments.sort()

with open(mydest, 'w') as dest:
    for i in range(len(readdetails)):
        cutcomm = readcomments[i][readcomments[i].find('"'):]
        dest.write('{}\t{}'.format(readdetails[i].strip('\n'),cutcomm))

回答1:


You could try to parse the first field as int with:

readdetails.sort(key=lambda x: int(x.split()[0]))

This will work well if all lines are in a consistent format.

Otherwise use a more complex function as a key function for list.sort(), e.g.:

def extract_id(line):
    # do something with line
    # and return an integer, or another kind of value

and pass it to sort function:

readdetails.sort(key=extract_id)



回答2:


I tried to recreate your data according to your explanation. Tell me if this is correct:

lines = """
123   foobar
1000  foobar
432   foobar
22    foobar
987   foobar
""".strip().split('\n')

print(lines)
lines.sort(key=lambda s: int(s[:4]))
print(lines)

Result:

['123   foobar', '1000  foobar', '432   foobar', '22    foobar', '987   foobar'] # initial
['22    foobar', '123   foobar', '432   foobar', '987   foobar', '1000  foobar'] # final

I suppose that your integer id is limited to 4 digits, as you said in the OP. If the id size is variable you may simply replace the sorting function:

lines.sort(key=lambda s: int(s.split()[0]))



回答3:


If your difficulties relate to sorting a list by the first four characters of each entry try this method from https://wiki.python.org/moin/HowTo/Sorting:

with open(details,'rt') as src:
    read_details = src.readlines()
    read_details = sorted(read_details, key=lambda detail: detail[:4])

with open(comments,'rt') as src:
    read_comments = src.readlines()
    read_comments = sorted(read_comments, key=lambda comment: comment[:4])

I'm not entirely sure what you're trying to achieve with the last part - an example of what you have in the comments and details files with an example of what you want an entry to look like in the destination would be useful.



来源:https://stackoverflow.com/questions/49895423/python-sorting-strings-numerically

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!