I\'m confused about how context-sensitivity and ambiguity influence each other.
What i think is correct is:
Ambiguity:
An ambiguous grammar leads to the
Context-sensitivity and ambiguity are entirely orthogonal. There exist ambiguous context-free languages and unambiguous context-sensitive languages.
A context-sensitive language is a formal language that can be parsed by a context-sensitive grammar (CSG). Every context-free language is also a context-sensitive language since context-free grammars are simplified context-sensitive languages. Not every formal language is context-sensitive though; there are languages that even a CSG cannot describe.
If you want to parse a context-sensitive language with a context-free parser, you define an a context-free grammar that accepts a superset of a context-sensitive language (because they are less powerful). Because you accept a superset, you might get ambiguities or false-positive results, which must be resolved after the parsing.
Example One: a XML-like language that allows for any tag name. Because context-free grammar cannot parse a sentence ww that consists of two repetitive words w = {a,b}+, it cannot parse <ID></ID>
where IDs are equal as well. Thus one defines a context-free grammar that accepts a superset:
start -> elem
elem -> open elem* close
open -> '<' ID '>'
close -> '</' ID '>'
ID -> ('a' / 'b')+
This obviously parses even the sentences that one doesn't want, therefore an extra check for equivalent IDs in open
and close
has to be done.
Example Two: C-like Typedef in a very simple language. Imagine a language that contains only typedef, pointers and multiplications. It has only two IDs, a
and b
. An example of such a language:
typedef a;
b * a; // multiplication
a * b; // b is pointer to type a
The context-free grammar would be like:
start -> typedef multiplication-or-pointer+
typedef -> 'typedef' ID ';'
multiplication-or-pointer -> ID '*' ID ';'
ID -> 'a'
ID -> 'b'
The grammar does not accept superset, but it does not know if it sees multiplication or pointer, thus it is ambiguous. And therefore one has to go through the result and decide, if it is multiplication or pointer, depending what type is defined in typedef.
With context-sensitive grammar one can do much more. Very roughly (and imprecisely):
start -> typedef multiplication-or-pointer+
typedef -> 'typedef' ID ';'
multiplication-or-pointer -> pointer / multiplication
'typedef' 'a' ';' WHATEVER pointer -> 'a' '*' ID
'typedef' 'b' ';' WHATEVER pointer -> 'b' '*' ID
'typedef' 'b' ';' WHATEVER multiplication -> 'a' '*' ID
'typedef' 'a' ';' WHATEVER multiplication -> 'b' '*' ID
ID -> 'a'
ID -> 'b'
Please note, that what I show here is not precise, because I limited number of IDs. In general, there can be an infinite number of IDs. You can write a context-sensitive grammar for a general case (even though it must be absolutely unintuitive), but you can't write a context free grammar.
Regarding your Edit 1: I hope the previous example answers that.
Regarding your Edit 2: There are another tricks how to express that so the rules are not so limited, but they are usually mind-blowing and IMO it is the reason why nobody uses CSG formalism.
NB: context-sensitive grammar is equivalent to a linear bounded automaton, context-free grammar is equivalent to a pushdown automaton. It is not right to say that context-free parser is an opposite of context-sensitive parser.
Compilers do not use "pure" (whatever that may mean) grammars to do their parsing - they are real-world programs that do what all real-world programs do - apply heuristics in certain situations. This is why C++ compilers (and compilers for most other languages, except for undergrad exercises) are not produced using compiler generators.