I am looking for a Regex that allows me to validate json.
I am very new to Regex\'s and i know enough that parsing with Regex is bad but can it be used to validate?
Here my regexp for validate string:
^\"([^\"\\]*|\\(["\\\/bfnrt]{1}|u[a-f0-9]{4}))*\"$
Was written usign original syntax diagramm.
I tried @mario's answer, but it didn't work for me, because I've downloaded test suite from JSON.org (archive) and there were 4 failed tests (fail1.json, fail18.json, fail25.json, fail27.json).
I've investigated the errors and found out, that fail1.json
is actually correct (according to manual's note and RFC-7159 valid string is also a valid JSON). File fail18.json
was not the case either, cause it contains actually correct deeply-nested JSON:
[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]
So two files left: fail25.json
and fail27.json
:
[" tab character in string "]
and
["line
break"]
Both contains invalid characters. So I've updated the pattern like this (string subpattern updated):
$pcreRegex = '/
(?(DEFINE)
(?<number> -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
(?<boolean> true | false | null )
(?<string> " ([^"\n\r\t\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
(?<array> \[ (?: (?&json) (?: , (?&json) )* )? \s* \] )
(?<pair> \s* (?&string) \s* : (?&json) )
(?<object> \{ (?: (?&pair) (?: , (?&pair) )* )? \s* \} )
(?<json> \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
)
\A (?&json) \Z
/six';
So now all legal tests from json.org can be passed.
Most modern regex implementations allow for recursive regexpressions, which can verify a complete JSON serialized structure. The json.org specification makes it quite straightforward.
$pcre_regex = '
/
(?(DEFINE)
(?<number> -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
(?<boolean> true | false | null )
(?<string> " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
(?<array> \[ (?: (?&json) (?: , (?&json) )* )? \s* \] )
(?<pair> \s* (?&string) \s* : (?&json) )
(?<object> \{ (?: (?&pair) (?: , (?&pair) )* )? \s* \} )
(?<json> \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
)
\A (?&json) \Z
/six
';
It works quite well in PHP with the PCRE functions . Should work unmodified in Perl; and can certainly be adapted for other languages. Also it succeeds with the JSON test cases.
A simpler approach is the minimal consistency check as specified in RFC4627, section 6. It's however just intended as security test and basic non-validity precaution:
var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
eval('(' + text + ')');
I created a Ruby implementation of Mario's solution, which does work:
# encoding: utf-8
module Constants
JSON_VALIDATOR_RE = /(
# define subtypes and build up the json syntax, BNF-grammar-style
# The {0} is a hack to simply define them as named groups here but not match on them yet
# I added some atomic grouping to prevent catastrophic backtracking on invalid inputs
(?<number> -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0}
(?<boolean> true | false | null ){0}
(?<string> " (?>[^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " ){0}
(?<array> \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0}
(?<pair> \s* \g<string> \s* : \g<json> ){0}
(?<object> \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0}
(?<json> \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0}
)
\A \g<json> \Z
/uix
end
########## inline test running
if __FILE__==$PROGRAM_NAME
# support
class String
def unindent
gsub(/^#{scan(/^(?!\n)\s*/).min_by{|l|l.length}}/u, "")
end
end
require 'test/unit' unless defined? Test::Unit
class JsonValidationTest < Test::Unit::TestCase
include Constants
def setup
end
def test_json_validator_simple_string
assert_not_nil %s[ {"somedata": 5 }].match(JSON_VALIDATOR_RE)
end
def test_json_validator_deep_string
long_json = <<-JSON.unindent
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"id": 1918723,
"boolean": true,
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
JSON
assert_not_nil long_json.match(JSON_VALIDATOR_RE)
end
end
end
As was written above, if the language you use has a JSON-library coming with it, use it to try decoding the string and catch the exception/error if it fails! If the language does not (just had such a case with FreeMarker) the following regex could at least provide some very basic validation (it's written for PHP/PCRE to be testable/usable for more users). It's not as foolproof as the accepted solution, but also not that scary =):
~^\{\s*\".*\}$|^\[\n?\{\s*\".*\}\n?\]$~s
short explanation:
// we have two possibilities in case the string is JSON
// 1. the string passed is "just" a JSON object, e.g. {"item": [], "anotheritem": "content"}
// this can be matched by the following regex which makes sure there is at least a {" at the
// beginning of the string and a } at the end of the string, whatever is inbetween is not checked!
^\{\s*\".*\}$
// OR (character "|" in the regex pattern)
// 2. the string passed is a JSON array, e.g. [{"item": "value"}, {"item": "value"}]
// which would be matched by the second part of the pattern above
^\[\n?\{\s*\".*\}\n?\]$
// the s modifier is used to make "." also match newline characters (can happen in prettyfied JSON)
if I missed something that would break this unintentionally, I'm grateful for comments!
I realize that this is from over 6 years ago. However, I think there is a solution that nobody here has mentioned that is way easier than regexing
function isAJSON(string) {
try {
JSON.parse(string)
} catch(e) {
if(e instanceof SyntaxError) return false;
};
return true;
}