General string quoting for TCL

前提是你 提交于 2019-12-03 01:45:22

You really only need 2 rules,

  • Escape curly braces
  • Wrap the output in curly braces

You don't need to worry about newlines, non printable characters etc. They are valid in a literal string, and TCL has excellent Unicode support.

set s { 
this is
a 
long 
string. I have $10 [10,000 cents] only curly braces \{ need \} to be escaped.
\t is not  a real tab, but '    ' is. "quoting somthing" :
{matchin` curly braces are okay, list = string in tcl}
}

Edit In light of your comment, you can do the following:

  • escape [] {} and $
  • wrap the whole output in set s [subst { $output } ]

The beauty of Tcl is it a has a very simple grammar. There are no other characters besides the 3 above needed to be escaped.

Edit 2 One last try.

If you pass subst some options, you will only need to escape \ and {}

set s [subst -nocommands -novariables { $output } ]

You would need to come up with a regex to convert non printable characters to their escaped codes however.

Good luck!

Tcl has very few metacharacters once you're inside a double-quoted string, and all of them can be quoted by putting a backslash in front of them. The characters you must quote are \ itself, $ and [, but it's considered good practice to also quote ], { and } so that the script itself is embeddable. (Tcl's own list command does this, except that it doesn't actually wrap the double quotes so it also handles backslashes and it will also try to use other techniques on “nice” strings. There's an algorithm for doing this, but I advise not bothering with that much complexity in your code; simple universal rules are much better for correct coding.)

The second step is to get the data into Tcl. If you are generating a file, your best option is to write it as UTF-8 and use the -encoding option to tclsh/wish or to the source command to explicitly state what the encoding is. (If you're inside the same process, write UTF-8 data into a string and evaluate that. Job Done.) That option (introduced in Tcl 8.5) is specifically for dealing with this sort of problem:

source -encoding "utf-8" theScriptYouWrote.tcl

If that's not possible, you're going to have to fall back to adding additional quoting. The best thing is to then assume you've only got ASCII support available (a good lowest common denominator) and quote everything else as a separate step to the quoting described in the first paragraph. To quote, convert every Unicode character from U+00080 up to an escape sequence of the form \uXXXX where XXXX are exactly four hex digits[1] and the other two are literal characters. Don't use the \xXX form, as that has some “surprising” misfeatures (alas).


[1] There's an open bug in Tcl about handling characters outside the Basic Multilingual Pane, part of which is that the \u form isn't able to cope. Fortunately, non-BMP characters are still reasonably rare in practice.

To do it right you should also specify the encoding your python string is in, typically sys.getdefaultencoding(). Otherwise you might garble encodings when translating it to Tcl.

If you have binary data in your string and want Tcl binary strings as a result this will always work:

data = "".join("\\u00%02x" % ord(c) for c in mystring)
tcltxt = "set x %s" % data

Will look like a hex dump though, but well, it is a hex dump...

If you use any special encoding like UTF-8 you can enhance that a bit by using encoding convertfrom/convertto and the appropriate Python idiom.

data = "".join("\\u00%02x" % ord(c) for c in myutf8string)
tcltext = "set x [encoding convertfrom utf-8 %s]" % data

You can of course refine this a bit, avoiding the \u encoding of all the non special chars, but the above is safe in any case.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!