问题
This article on how browsers work explains how CSS is context free, while HTML is not. But what about JavaScript, is JavaScript context free?
I am learning about CFG and formal proofs, but am a long way away from understanding how to figure this out. Does anyone know if JavaScript is context free or not?
回答1:
No, JavaScript is not a context-free language.
It is very close to one, and the ECMAScript 5 specification does indeed use a context-free grammar1 to describe the language's syntax (you can find all productions in Annex A).
Of course, it does make some extensions to pure context-free grammatical productions, and describes extra behaviour of the parser. One particular thing is the usage of lookahead which still makes a context-free languages, but would complicate the grammar a lot if it couldn't be used for some rules. Not allowing certain things to appear in strict mode code is similar - it could be done by adjusting the grammar (with far more productions), but the rule is much easier expressed by leaving the BNF.
However, there are also some2 rules that do make the language not context-free. You'll find an overview in the description of early errors, which can make a program code invalid. That object literals must not contain duplicate property names and that function parameter lists must not contain duplicate identifiers are two rules that cannot be expressed using (finite) context-free grammars.
My gut tells me that the automatic semicolon insertion falls in the same box, but I think its rules are too complicated to even attempt a proof here.
1: Actually it uses two grammars, a lexical and a syntactical one, where the first disambiguates between division expressions and regular expressions, and does produce the tokens that are the input to the second grammar.
2: Rather few actually, compared to other programming languages
回答2:
No programming language is (completely) context-free (i would say including CSS). Even though context-free grammars (CFGs) may be used to define/generate compilers/parsers for the language.
The simple fact (for example) that variables need to be defined first, before used, or that declarations involving identifiers should be unique, makes the language "context-sensitive".
A grammar for a (programming) language is supposed to describe (and generate) strings which are only the valid programs in that language (syntacticaly, but also semanticaly). Yet a CFG can describe and generate strings which are not valid programs (given the language semantics and specification). Conditions which describe valid programs (like for example: 1. a class
needs to be defined before using new class()
, 2. ids
must match etc..) require context-sensitivity.
No CFG (with any finite number of productions) can correctly represent only the valid strings of this language: { anbncn : n >= 1 }, where n
should be the same for a
, b
, c
(it should match). Note one can indeed define a CFG for (a superset of) this language, but it will accept also non-valid strings along with valid ones (and then by other means filter them out), this is not what a grammar specification for a language is supposed to do. It should accept only the valid strings and reject the non-valid. In an analogy with statistics, one could say that a grammar specification for a language should eliminate/minimise both Type-I (reject valid strings) and Type-II (accept non-valid strings) errors, not just one of them.
Let me give a simple example in the context of JavaScript (since variables may seem as posing no problem for JavaScript).
In JavaScript (in strict mode), duplicate named function declaration is not valid. So this is not valid:
function duplicateFunc(){}
function duplicateFunc(){} // duplicate named function declaration
So the program is not correct, yet a CFG cannot handle this type of condition.
Even turning on strict mode itself is context-sensitive
a subset of strict mode rules can be handled by spliting the CFG in cases and parsing accordingly as per @Bergi's answer (strict mode examples removed)
[UPDATE]
i will try to give a couple of examples of JavaScript non-context-free code which does not require "strict mode" (open to suggestions/corrections).
The use of reserved words/keywords is an extension (or limitation) on the grammar. It is an extraneous feature, so the following examples should count as examples of non-CF behaviour.
var var; // identifier using reserved name
var function; // identifier using reserved name
obj.var; // reserved name used as (explicit) property
obj["var"]; // this is fine!!
Object++; // built-in type used as numeric variable
[/UPDATE]
So the context plays a part in the correct parsing of the program. As it is said "context is everything"!
However this context-sensitivity can be handled (hopefuly) by only slight extensions to context-free grammars (like for example Attribute Grammars, Affix Grammars, TAG Grammars and so on), which still make for efficient parsing (meaning in polynomial time).
[UPDATE]
"i would say including CSS"
To elaborate a little on this statement. CSS1
would be CF
, but as CSS
specification adds more features inclufing variable
support (e.g css-counters) it makes the CSS
code context-sensitive in the sense described above (e.g variables need to be defined before used). so the following css
code would be parsed by the browser (and ignored as it is not valid) but it cannot be described by a CFG
body { }
h3::before {
counter-increment: section; /* no counter section has been defined, not valid css code */
content: "Section" counter(section) ": "; /* Display the counter */
}
[/UPDATE]
回答3:
I'm pretty certain JS is not context free — given an arbitrary code artefact, you cannot necessarily determine its exact meaning without knowing its context.
The first example that comes to mind is {}
— does this represent an empty object literal or an empty statement block? It's impossible to decide without context, but because the language allows semicolons to be omitted from statements ending in '}' (as do most languages with C-like syntax) it may also be undecidable with context! Consider {x: {}}
— this could be an object literal with the "x" field containing an empty object, or a statement block with a labelled sub-statement (where the label is 'x' and the sub-statement is {}
). Perhaps the language specification has some rules for selecting the correct interpretation in such scenarios, but in any case the language does not appear to be context-free, judging by these examples alone.
JavaScript's 'automatic semicolon insertion' feature certainly doesn't help in distinguishing expressions and statements.
Here's another one to think about: function x() {}
— what does this do? If it's a statement, it declares a new hoisted variable 'x' with this function as its value. If it's an expression, it simply evaluates to a function which has an upvalue 'x' bound to the same function (for self-reference).
来源:https://stackoverflow.com/questions/30697267/is-javascript-a-context-free-language