Trying to remove non-printable charaters(junk values) from a UNIX file

前端 未结 3 426
囚心锁ツ
囚心锁ツ 2021-01-18 13:40

I am trying to remove non-printable character (for e.g. ^@) from records in my file. Since the volume to records is too big in the file using cat is not an opti

相关标签:
3条回答
  • 2021-01-18 13:41

    Perhaps you could go with the complement of [:print:], which contains all printable characters:

    tr -cd '[:print:]' < file > newfile
    

    If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):

    sed 's/[^[:print:]]//g' file
    
    0 讨论(0)
  • 2021-01-18 13:58
    strings -1 file... > outputfile
    

    seems to work

    0 讨论(0)
  • 2021-01-18 14:05

    Remove all control characters first:

    tr -dc '\007-\011\012-\015\040-\376' < file > newfile
    

    Then try your string:

    sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' newfile
    

    I believe that what you see ^@ is in fact a zero value \0.
    The tr filter from above will remove those as well.

    0 讨论(0)
提交回复
热议问题