sorting numerically by first row

后端 未结 3 671
一整个雨季
一整个雨季 2021-01-26 13:22

I have a file with almost 900 lines in excel that I\'ve saved as a tab deliminated .txt file. I\'d like to sort the text file by the numbers given in the first column (they rang

相关标签:
3条回答
  • 2021-01-26 13:50

    As stated here you can have problems with this (and in the other pseudo-follow-up-duplicate question you asked, yes, you did)

    tr '\r' '\n' < myfile.txt | sort -n
    

    It works fine here on MSYS but on some platforms you may have to add:

    export LC_CTYPE=C
    

    or tr will consider the file as a text file, and probably will tag it as corrupt after having reached the max line limit.

    Obviously I could not test it, but I'm confident it will solve the problem given what I read on the linked answer.

    0 讨论(0)
  • 2021-01-26 13:59

    A python approach (python 2 & 3 compatible), immune to all shell problems. Works great, and portable. I noticed that the input file has some '0x8C' chars (exotic dots), probably confusing tr command. That is handled properly below:

    import csv,sys
    
    # read the file as binary, as it is not really text
    with open("Proteins.txt","rb") as f:
        data = bytearray(f.read())
        # replace 0x8c char by classical dots
        for i,c in enumerate(data):
            if c>0x7F: # non-ascii: replace by dot
                data[i] = ord(".")
    
        # convert to list of ASCII strings (split using the old MAC separator)
        lines = "".join(map(chr,data)).split("\r")
    
        # treat our lines as input for CSV reader
        cr = csv.reader(lines,delimiter='\t',quotechar='"')
    
        # read all the lines in a list    
        rows = list(cr)
        # perform the sort (tricky)
        # on first row, numerical, removing the leading 0 which is illegal
        # in python 3, and if not numerical, put it at the top
    
        rows = sorted(rows,key=lambda x : x[0].isdigit() and int(x[0].strip("0")))
    
    # write back the file as a nice, legal, ASCII tsv file
    
    if sys.version_info < (3,):
        f = open("Proteins_sorted_2.txt","wb")
    else:
        f = open("Proteins_sorted_2.txt","w",newline='')
    
    cw = csv.writer(f,delimiter='\t',quotechar='"')
    cw.writerows(rows)
    f.close()
    
    0 讨论(0)
  • 2021-01-26 14:05

    Use this to convert old Mac OS carriage return to newline:

    tr '\r' '\n' < myfile.txt | sort
    
    0 讨论(0)
提交回复
热议问题