General string quoting for TCL

前端 未结 3 691
暗喜
暗喜 2021-02-05 09:25

I\'m writing a utility (which happens to be in python) which is generating output in the form of a TCL script. Given some arbitrary string variable (not unicode) in the python,

3条回答
  •  悲&欢浪女
    2021-02-05 09:52

    Tcl has very few metacharacters once you're inside a double-quoted string, and all of them can be quoted by putting a backslash in front of them. The characters you must quote are \ itself, $ and [, but it's considered good practice to also quote ], { and } so that the script itself is embeddable. (Tcl's own list command does this, except that it doesn't actually wrap the double quotes so it also handles backslashes and it will also try to use other techniques on “nice” strings. There's an algorithm for doing this, but I advise not bothering with that much complexity in your code; simple universal rules are much better for correct coding.)

    The second step is to get the data into Tcl. If you are generating a file, your best option is to write it as UTF-8 and use the -encoding option to tclsh/wish or to the source command to explicitly state what the encoding is. (If you're inside the same process, write UTF-8 data into a string and evaluate that. Job Done.) That option (introduced in Tcl 8.5) is specifically for dealing with this sort of problem:

    source -encoding "utf-8" theScriptYouWrote.tcl
    

    If that's not possible, you're going to have to fall back to adding additional quoting. The best thing is to then assume you've only got ASCII support available (a good lowest common denominator) and quote everything else as a separate step to the quoting described in the first paragraph. To quote, convert every Unicode character from U+00080 up to an escape sequence of the form \uXXXX where XXXX are exactly four hex digits[1] and the other two are literal characters. Don't use the \xXX form, as that has some “surprising” misfeatures (alas).


    [1] There's an open bug in Tcl about handling characters outside the Basic Multilingual Pane, part of which is that the \u form isn't able to cope. Fortunately, non-BMP characters are still reasonably rare in practice.

提交回复
热议问题