问题
I have a file with almost 900 lines in excel that I've saved as a tab deliminated .txt file. I'd like to sort the text file by the numbers given in the first column (they range between 0 and 2250). The other columns are both numbers and letters of varying length eg.
myfile.txt:
0251 abcd 1234,24 bcde
2240 efgh 2345,98 ikgpppm
0001 lkjsi 879,09 ikol
I've tried
sort -k1 -n myfile.txt > myfile_num.txt
but I just get an identical file with new name. I'd like to get:
myfile_num.txt
0001 lkjsi 879,09 ikol
0251 abcd 1234,24 bcde
2240 efgh 2345,98 ikgpppm
What am I doing wrong? I'm guessing that it's quite simple, but I'd appreciate any help I can get! I only know a little bash scripting, so it'd be nice if the script is a very simple one-liner that I can understand :)
Thanks :)
回答1:
Use this to convert old Mac OS carriage return to newline:
tr '\r' '\n' < myfile.txt | sort
回答2:
As stated here you can have problems with this (and in the other pseudo-follow-up-duplicate question you asked, yes, you did)
tr '\r' '\n' < myfile.txt | sort -n
It works fine here on MSYS but on some platforms you may have to add:
export LC_CTYPE=C
or tr
will consider the file as a text file, and probably will tag it as corrupt after having reached the max line limit.
Obviously I could not test it, but I'm confident it will solve the problem given what I read on the linked answer.
回答3:
A python approach (python 2 & 3 compatible), immune to all shell problems. Works great, and portable. I noticed that the input file has some '0x8C' chars (exotic dots), probably confusing tr
command.
That is handled properly below:
import csv,sys
# read the file as binary, as it is not really text
with open("Proteins.txt","rb") as f:
data = bytearray(f.read())
# replace 0x8c char by classical dots
for i,c in enumerate(data):
if c>0x7F: # non-ascii: replace by dot
data[i] = ord(".")
# convert to list of ASCII strings (split using the old MAC separator)
lines = "".join(map(chr,data)).split("\r")
# treat our lines as input for CSV reader
cr = csv.reader(lines,delimiter='\t',quotechar='"')
# read all the lines in a list
rows = list(cr)
# perform the sort (tricky)
# on first row, numerical, removing the leading 0 which is illegal
# in python 3, and if not numerical, put it at the top
rows = sorted(rows,key=lambda x : x[0].isdigit() and int(x[0].strip("0")))
# write back the file as a nice, legal, ASCII tsv file
if sys.version_info < (3,):
f = open("Proteins_sorted_2.txt","wb")
else:
f = open("Proteins_sorted_2.txt","w",newline='')
cw = csv.writer(f,delimiter='\t',quotechar='"')
cw.writerows(rows)
f.close()
来源:https://stackoverflow.com/questions/39058356/bash-sort-numerical-osx