Regex to validate JSON

前端 未结 12 2206
挽巷
挽巷 2020-11-22 11:37

I am looking for a Regex that allows me to validate json.

I am very new to Regex\'s and i know enough that parsing with Regex is bad but can it be used to validate?

相关标签:
12条回答
  • 2020-11-22 11:57

    Looking at the documentation for JSON, it seems that the regex can simply be three parts if the goal is just to check for fitness:

    1. The string starts and ends with either [] or {}
      • [{\[]{1}...[}\]]{1}
    2. and
      1. The character is an allowed JSON control character (just one)
        • ...[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]...
      2. or The set of characters contained in a ""
        • ...".*?"...

    All together: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}

    If the JSON string contains newline characters, then you should use the singleline switch on your regex flavor so that . matches newline. Please note that this will not fail on all bad JSON, but it will fail if the basic JSON structure is invalid, which is a straight-forward way to do a basic sanity validation before passing it to a parser.

    0 讨论(0)
  • 2020-11-22 11:58

    For "strings and numbers", I think that the partial regular expression for numbers:

    -?(?:0|[1-9]\d*)(?:\.\d+)(?:[eE][+-]\d+)?
    

    should be instead:

    -?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?
    

    since the decimal part of the number is optional, and also it is probably safer to escape the - symbol in [+-] since it has a special meaning between brackets

    0 讨论(0)
  • 2020-11-22 12:01

    Regex that validate simple JSON not JSONArray

    it validate key(string):value(string,integer,[{key:value},{key:value}],{key:value})

    ^\{(\s|\n\s)*(("\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))*(\s|\n)*\}$
    

    sample data that validate by this JSON

    {
    "key":"string",
    "key": 56,
    "key":{
            "attr":"integer",
            "attr": 12
            },
    "key":{
            "key":[
                {
                    "attr": 4,
                    "attr": "string"
                }
            ]
         }
    }
    
    0 讨论(0)
  • 2020-11-22 12:04

    A trailing comma in a JSON array caused my Perl 5.16 to hang, possibly because it kept backtracking. I had to add a backtrack-terminating directive:

    (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) )(*PRUNE) \s* )
                                                                                       ^^^^^^^^
    

    This way, once it identifies a construct that is not 'optional' (* or ?), it shouldn't try backtracking over it to try to identify it as something else.

    0 讨论(0)
  • 2020-11-22 12:05

    Because of the recursive nature of JSON (nested {...}-s), regex is not suited to validate it. Sure, some regex flavours can recursively match patterns* (and can therefor match JSON), but the resulting patterns are horrible to look at, and should never ever be used in production code IMO!

    * Beware though, many regex implementations do not support recursive patterns. Of the popular programming languages, these support recursive patterns: Perl, .NET, PHP and Ruby 1.9.2

    0 讨论(0)
  • 2020-11-22 12:07

    Yes, it's a common misconception that Regular Expressions can match only regular languages. In fact, the PCRE functions can match much more than regular languages, they can match even some non-context-free languages! Wikipedia's article on RegExps has a special section about it.

    JSON can be recognized using PCRE in several ways! @mario showed one great solution using named subpatterns and back-references. Then he noted that there should be a solution using recursive patterns (?R). Here is an example of such regexp written in PHP:

    $regexString = '"([^"\\\\]*|\\\\["\\\\bfnrt\/]|\\\\u[0-9a-f]{4})*"';
    $regexNumber = '-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?';
    $regexBoolean= 'true|false|null'; // these are actually copied from Mario's answer
    $regex = '/\A('.$regexString.'|'.$regexNumber.'|'.$regexBoolean.'|';    //string, number, boolean
    $regex.= '\[(?:(?1)(?:,(?1))*)?\s*\]|'; //arrays
    $regex.= '\{(?:\s*'.$regexString.'\s*:(?1)(?:,\s*'.$regexString.'\s*:(?1))*)?\s*\}';    //objects
    $regex.= ')\Z/is';
    

    I'm using (?1) instead of (?R) because the latter references the entire pattern, but we have \A and \Z sequences that should not be used inside subpatterns. (?1) references to the regexp marked by the outermost parentheses (this is why the outermost ( ) does not start with ?:). So, the RegExp becomes 268 characters long :)

    /\A("([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"|-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?|true|false|null|\[(?:(?1)(?:,(?1))*)?\s*\]|\{(?:\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1)(?:,\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1))*)?\s*\})\Z/is
    

    Anyway, this should be treated as a "technology demonstration", not as a practical solution. In PHP I'll validate the JSON string with calling the json_decode() function (just like @Epcylon noted). If I'm going to use that JSON (if it's validated), then this is the best method.

    0 讨论(0)
提交回复
热议问题