OSX perl to batch write filename as first line in txt file in UTF-16LE

那年仲夏 提交于 2019-12-24 03:24:07

问题


I found a really useful bit of perl here that writes the filename of a text file to the first line of the file. I am running this from terminal in OS X Yosemite:

perl -i -pe 'BEGIN{undef $/;} s/^/\nFilename:$ARGV\n/' `find . -name '*.TXT'`

With some modification I thought it had solved my specific problem however the files I'm picking up are UTF-16LE and I've since discovered this command is writing in UTF-8 and making a real mess of the output (text is visibly correct but is not recognised in calculations in excel, filemaker etc).

After several attempts I need help with getting this script to write the filename in UTF-16LE to the start of the file. (Note: I do have a workaround now of batch convert files to UTF-8, then run this however I'd prefer to have this workflow in one step).


回答1:


reinierpost was correct - it was more about removing the original unicode byte order mark (BOM). What worked in the end was:

perl -i -pe 'BEGIN{undef $/;} s/\xFF\xFE/Filename:$ARGV\n/' `find . -name '*.TXT'`

where the UTF-16LE BOM \xFF\xFE is replaced by my new string. For reference some other BOMs are : - iso-10646-1 > \xFE\xFF - UTF-16BE > \xFE\xFF - UTF-8 > \xEF\xBB\xBF

I was also able to write the new text into UTF-16LE with

perl -i -pe 'BEGIN{binmode STDIN,":encoding(utf8)";binmode STDOUT,":encoding(utf16)"; undef $/;} s/\xFF\xFE/\xFF\xFE\nFilename:$ARGV\n/' `find . -name '*.TXT'`

however I now believe that my source data is a mixed bag of UTF8 and UTF16 as this last version creates a mixed set of characters between the new header and the data. Thanks reinierpost for steering me in the right direction. I remain interested if others can improve this.



来源:https://stackoverflow.com/questions/31337086/osx-perl-to-batch-write-filename-as-first-line-in-txt-file-in-utf-16le

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!