I have a file with almost 900 lines in excel that I\'ve saved as a tab deliminated .txt file. I\'d like to sort the text file by the numbers given in the first column (they rang
As stated here you can have problems with this (and in the other pseudo-follow-up-duplicate question you asked, yes, you did)
tr '\r' '\n' < myfile.txt | sort -n
It works fine here on MSYS but on some platforms you may have to add:
export LC_CTYPE=C
or tr
will consider the file as a text file, and probably will tag it as corrupt after having reached the max line limit.
Obviously I could not test it, but I'm confident it will solve the problem given what I read on the linked answer.
A python approach (python 2 & 3 compatible), immune to all shell problems. Works great, and portable. I noticed that the input file has some '0x8C' chars (exotic dots), probably confusing tr
command.
That is handled properly below:
import csv,sys
# read the file as binary, as it is not really text
with open("Proteins.txt","rb") as f:
data = bytearray(f.read())
# replace 0x8c char by classical dots
for i,c in enumerate(data):
if c>0x7F: # non-ascii: replace by dot
data[i] = ord(".")
# convert to list of ASCII strings (split using the old MAC separator)
lines = "".join(map(chr,data)).split("\r")
# treat our lines as input for CSV reader
cr = csv.reader(lines,delimiter='\t',quotechar='"')
# read all the lines in a list
rows = list(cr)
# perform the sort (tricky)
# on first row, numerical, removing the leading 0 which is illegal
# in python 3, and if not numerical, put it at the top
rows = sorted(rows,key=lambda x : x[0].isdigit() and int(x[0].strip("0")))
# write back the file as a nice, legal, ASCII tsv file
if sys.version_info < (3,):
f = open("Proteins_sorted_2.txt","wb")
else:
f = open("Proteins_sorted_2.txt","w",newline='')
cw = csv.writer(f,delimiter='\t',quotechar='"')
cw.writerows(rows)
f.close()
Use this to convert old Mac OS carriage return to newline:
tr '\r' '\n' < myfile.txt | sort