General string quoting for TCL

前端 未结 3 704
暗喜
暗喜 2021-02-05 09:25

I\'m writing a utility (which happens to be in python) which is generating output in the form of a TCL script. Given some arbitrary string variable (not unicode) in the python,

相关标签:
3条回答
  • 2021-02-05 09:48

    To do it right you should also specify the encoding your python string is in, typically sys.getdefaultencoding(). Otherwise you might garble encodings when translating it to Tcl.

    If you have binary data in your string and want Tcl binary strings as a result this will always work:

    data = "".join("\\u00%02x" % ord(c) for c in mystring)
    tcltxt = "set x %s" % data
    

    Will look like a hex dump though, but well, it is a hex dump...

    If you use any special encoding like UTF-8 you can enhance that a bit by using encoding convertfrom/convertto and the appropriate Python idiom.

    data = "".join("\\u00%02x" % ord(c) for c in myutf8string)
    tcltext = "set x [encoding convertfrom utf-8 %s]" % data
    

    You can of course refine this a bit, avoiding the \u encoding of all the non special chars, but the above is safe in any case.

    0 讨论(0)
  • 2021-02-05 09:51

    You really only need 2 rules,

    • Escape curly braces
    • Wrap the output in curly braces

    You don't need to worry about newlines, non printable characters etc. They are valid in a literal string, and TCL has excellent Unicode support.

    set s { 
    this is
    a 
    long 
    string. I have $10 [10,000 cents] only curly braces \{ need \} to be escaped.
    \t is not  a real tab, but '    ' is. "quoting somthing" :
    {matchin` curly braces are okay, list = string in tcl}
    }
    

    Edit In light of your comment, you can do the following:

    • escape [] {} and $
    • wrap the whole output in set s [subst { $output } ]

    The beauty of Tcl is it a has a very simple grammar. There are no other characters besides the 3 above needed to be escaped.

    Edit 2 One last try.

    If you pass subst some options, you will only need to escape \ and {}

    set s [subst -nocommands -novariables { $output } ]

    You would need to come up with a regex to convert non printable characters to their escaped codes however.

    Good luck!

    0 讨论(0)
  • 2021-02-05 09:52

    Tcl has very few metacharacters once you're inside a double-quoted string, and all of them can be quoted by putting a backslash in front of them. The characters you must quote are \ itself, $ and [, but it's considered good practice to also quote ], { and } so that the script itself is embeddable. (Tcl's own list command does this, except that it doesn't actually wrap the double quotes so it also handles backslashes and it will also try to use other techniques on “nice” strings. There's an algorithm for doing this, but I advise not bothering with that much complexity in your code; simple universal rules are much better for correct coding.)

    The second step is to get the data into Tcl. If you are generating a file, your best option is to write it as UTF-8 and use the -encoding option to tclsh/wish or to the source command to explicitly state what the encoding is. (If you're inside the same process, write UTF-8 data into a string and evaluate that. Job Done.) That option (introduced in Tcl 8.5) is specifically for dealing with this sort of problem:

    source -encoding "utf-8" theScriptYouWrote.tcl
    

    If that's not possible, you're going to have to fall back to adding additional quoting. The best thing is to then assume you've only got ASCII support available (a good lowest common denominator) and quote everything else as a separate step to the quoting described in the first paragraph. To quote, convert every Unicode character from U+00080 up to an escape sequence of the form \uXXXX where XXXX are exactly four hex digits[1] and the other two are literal characters. Don't use the \xXX form, as that has some “surprising” misfeatures (alas).


    [1] There's an open bug in Tcl about handling characters outside the Basic Multilingual Pane, part of which is that the \u form isn't able to cope. Fortunately, non-BMP characters are still reasonably rare in practice.

    0 讨论(0)
提交回复
热议问题