I am running into a parsing problem when loading JSON files that seem to have the TAB character in them.
When I go to http://jsonlint.com/, and I e
Tabs are legal as delimiting whitespace outside of values, but not within strings. Use \t
instead.
EDIT: Based on your comments, I see some confusion about what a tab actually is.. the tab character is just a normal character, like 'a' or '5' or '.' or any other character that you enter by pressing a key on your keyboard. It takes up a single byte, whose numeric value is 9. There are no backslashes or lowercase 't's involved.
What puts tab in a different category from 'a' or '5' or '.' is the fact that you, as a human using your eyeballs, generally can't look at a display of text and identify or count tab characters. Visually, a sequence of tabs is identical to a sequence of (a usually larger but still visually indeterminate number of) spaces.
In order to unambiguously represent tabs inside text meant for computer processing, we have various syntactic methods to say "Hey, some piece of software! Replace this junk with a tab character later, OK?".
In the history of programming languages there have been two main approaches; if you go back to the 1950's, you get both approaches existing side by side, one in each of two of the oldest high-level languages. Lisp had named character literals like #\Tab
; these were converted as soon as they were read from the program source. Fortran only had the CHAR
function, which was called at runtime and returned the character whose number matched the argument: CHAR(9)
returned a tab. (Of course, if it were really CHAR(9)
and not CHAR(
some expression that works out to 9)
, an optimizing compiler might notice that and replace the function call with a tab at compile time, putting us back over in the other camp.)
In general, with both solution types, if you wanted to stick the special character inside a larger string, you had to do the concatenation yourself; for instance, a kid hacking BASIC in the 80's might write something like this:
10 PRINT "This is a tab ->"; CHR$(9); "<- That was a tab"
But some languages - most notably the family that began with the language B - introduced the ability to include these characters directly inside a string literal:
printf("This is a tab -> *t <- That was a tab");
BCPL retained the *
syntax, but the next language in the series, C, replaced it with the backslash, probably because they needed to read and write literal asterisks a lot more often than literal backslashes.
Anyway, a whole host of languages, including both Python and Javascript, have borrowed or inherited C's conventions here. So in both languages, the two expressions "\t"
and '\t'
each result in a one-character string where that one character is a tab.
JSON is based on Javascript's syntax, but it only allows a restricted subset of it. For example, strings have to be enclosed in double quotation marks ("
) instead of single ones ('
), and literal tabs are not allowed inside them.
That means that this Python string from your update:
foo = '{"My_string": "Foo bar.\t Bar foo."}'
is not valid JSON. The Python interpreter turns the \t
sequence into an actual tab character as soon as it reads the string - long before the JSON processor ever sees it.
You can tell Python to put a literal \t
in the string instead of a tab character by doubling the backslash:
foo = '{"My_string": "Foo bar.\\t Bar foo."}'
Or you can use the "raw" string syntax, which doesn't interpret the special backslash sequences at all:
foo = r'{"My_string": "Foo bar.\t Bar foo."}'
Either way, the JSON processor will see a string containing a backslash followed by a 't', rather than a string containing a tab.
From JSON standard:
Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.
It means that a literal tab character is not allowed inside a JSON string. You need to escape it as \t
(in a .json-file):
{"My_string": "Foo bar.\t Bar foo."}
In addition if json text is provided inside a Python string literal then you need double escape the tab:
foo = '{"My_string": "Foo bar.\\t Bar foo."}' # in a Python source
Or use a Python raw string literal:
foo = r'{"My_string": "Foo bar.\t Bar foo."}' # in a Python source
Just to share my experience:
I am using snakemake and a config file written in Json. There are tabs in the json file for indentation. TAB are legal for this purpose. But I am getting error message: snakemake.exceptions.WorkflowError: Config file is not valid JSON or YAML. I believe this is a bug of snakemake; but I could be wrong. Please comment. After replacing all TABs with spaces the error message is gone.
In node-red flow i facing same type of problem:
flow.set("delimiter",'"\t"');
error:
{ "status": "ERROR", "result": "Cannot parse config: String: 1: in value for key 'delimiter': JSON does not allow unescaped tab in quoted strings, use a backslash escape" }
solution:
i added in just \\t
in the code.
flow.set("delimiter",'"\\t"');
You can include tabs within values (instead of as whitespace) in JSON files by escaping them. Here's a working example with the json
module in Python2.7:
>>> import json
>>> obj = json.loads('{"MY_STRING": "Foo\\tBar"}')
>>> obj['MY_STRING']
u'Foo\tBar'
>>> print obj['MY_STRING']
Foo Bar
While not escaping the '\t'
causes an error:
>>> json.loads('{"MY_STRING": "Foo\tBar"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 19 (char 18)