i can´t see my error here .. this rule parse some stuff ok but the last two samples not. Could somebody please give me a hint ..
Goal is a parser than can identify
Try changing
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')'))
to
>> *(*(lit('.') >> name_pure) >> lit('(') > paralistopt > lit(')'))
You provide very little information to go at. Let me humor you with my entry into this guessing game:
Let's assume you want to parse a simple "language" that merely allows member expressions and function invocations, but chained.
Now, your grammar says nothing about the parameters (though it's clear the param list can be empty), so let me go the next mile and assume that you want to accept the same kind of expressions there (so foo(a)
is okay, but also bar(foo(a))
or bar(b.foo(a))
).
Since you accept chaining of function calls, it appears that functions are first-class objects (and functions can return functions), so foo(a)(b, c, d)
should be accepted as well.
You didn't mention it, but parameters often include literals (sqrt(9)
comes to mind, or println("hello world")
).
Other items:
iter_pos
(ab)use it seems you're interested in tracking the original source location inside the resulting AST.We should keep it simple as ever:
namespace Ast {
using Identifier = boost::iterator_range<It>;
struct MemberExpression;
struct FunctionCall;
using Expression = boost::variant<
double, // some literal types
std::string,
// non-literals
Identifier,
boost::recursive_wrapper<MemberExpression>,
boost::recursive_wrapper<FunctionCall>
>;
struct MemberExpression {
Expression object; // antecedent
Identifier member; // function or field
};
using Parameter = Expression;
using Parameters = std::vector<Parameter>;
struct FunctionCall {
Expression function; // could be a member function
Parameters parameters;
};
}
NOTE We're not going to focus on showing source locations, but already made one provision, storing identifiers as an iterator-range.
NOTE Fusion-adapting the only types not directly supported by Spirit:
BOOST_FUSION_ADAPT_STRUCT(Ast::MemberExpression, object, member) BOOST_FUSION_ADAPT_STRUCT(Ast::FunctionCall, function, parameters)
We will find that we don't use these, because Semantic Actions are more convenient here.
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [expression];
identifier = raw [ (alpha|'_') >> *(alnum|'_') ];
parameters = -(expression % ',');
expression
= literal
| identifier >> *(
('.' >> identifier)
| ('(' >> parameters >> ')')
);
literal = double_ | string_;
string_ = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES(
(identifier)(start)(parameters)(expression)(literal)(string_)
);
}
In this skeleton most rules benefit from automatic attribute propagation. The one that doesn't is expression
:
qi::rule<It, Expression()> start;
using Skipper = qi::space_type;
qi::rule<It, Expression(), Skipper> expression, literal;
qi::rule<It, Parameters(), Skipper> parameters;
// lexemes
qi::rule<It, Identifier()> identifier;
qi::rule<It, std::string()> string_;
So, let's create some helpers for the semantic actions.
NOTE An important take-away here is to create your own higher-level building blocks instead of toiling away with
boost::phoenix::construct<>
etc.
Define two simple construction functions:
struct mme_f { MemberExpression operator()(Expression lhs, Identifier rhs) const { return { lhs, rhs }; } };
struct mfc_f { FunctionCall operator()(Expression f, Parameters params) const { return { f, params }; } };
phx::function<mme_f> make_member_expression;
phx::function<mfc_f> make_function_call;
Then use them:
expression
= literal [_val=_1]
| identifier [_val=_1] >> *(
('.' >> identifier) [ _val = make_member_expression(_val, _1)]
| ('(' >> parameters >> ')') [ _val = make_function_call(_val, _1) ]
);
That's all. We're ready to roll!
Live On Coliru
I created a test bed looking like this:
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> const g;
for (std::string const input : {
"a()", "a(para)", "x.a()", "x.a(para)", "x.a(para).g(para).j()", "x.y", "x.y.z",
"x.y.z()",
"y.z.z(para)",
// now let's add some funkyness that you didn't mention
"bar(foo(a))",
"bar(b.foo(a))",
"foo(a)(b, c, d)", // first class functions
"sqrt(9)",
"println(\"hello world\")",
"allocate(strlen(\"aaaaa\"))",
"3.14",
"object.rotate(180)",
"object.rotate(event.getAngle(), \"torque\")",
"app.mainwindow().find_child(\"InputBox\").font().size(12)",
"app.mainwindow().find_child(\"InputBox\").font(config().preferences.baseFont(style.PROPORTIONAL))"
}) {
std::cout << " =========== '" << input << "' ========================\n";
It f(input.begin()), l(input.end());
Ast::Expression parsed;
bool ok = parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << parsed << "\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Incredible as it may appear, this already parses all the test cases and prints:
=========== 'a()' ========================
Parsed: a()
=========== 'a(para)' ========================
Parsed: a(para)
=========== 'x.a()' ========================
Parsed: x.a()
=========== 'x.a(para)' ========================
Parsed: x.a(para)
=========== 'x.a(para).g(para).j()' ========================
Parsed: x.a(para).g(para).j()
=========== 'x.y' ========================
Parsed: x.y
=========== 'x.y.z' ========================
Parsed: x.y.z
=========== 'x.y.z()' ========================
Parsed: x.y.z()
=========== 'y.z.z(para)' ========================
Parsed: y.z.z(para)
=========== 'bar(foo(a))' ========================
Parsed: bar(foo(a))
=========== 'bar(b.foo(a))' ========================
Parsed: bar(b.foo(a))
=========== 'foo(a)(b, c, d)' ========================
Parsed: foo(a)(b, c, d)
=========== 'sqrt(9)' ========================
Parsed: sqrt(9)
=========== 'println("hello world")' ========================
Parsed: println(hello world)
=========== 'allocate(strlen("aaaaa"))' ========================
Parsed: allocate(strlen(aaaaa))
=========== '3.14' ========================
Parsed: 3.14
=========== 'object.rotate(180)' ========================
Parsed: object.rotate(180)
=========== 'object.rotate(event.getAngle(), "torque")' ========================
Parsed: object.rotate(event.getAngle(), torque)
=========== 'app.mainwindow().find_child("InputBox").font().size(12)' ========================
Parsed: app.mainwindow().find_child(InputBox).font().size(12)
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
You're right. I cheated. I didn't show you this code required to debug print the parsed AST:
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, MemberExpression const& me) {
return os << me.object << "." << me.member;
}
static inline std::ostream& operator<<(std::ostream& os, FunctionCall const& fc) {
os << fc.function << "(";
bool first = true;
for (auto& p : fc.parameters) { if (!first) os << ", "; first = false; os << p; }
return os << ")";
}
}
It's only debug printing, as string literals aren't correctly roundtripped. But it's only 10 lines of code, that's a bonus.
This had your interest, so let's show it working. Let's add a simple loop to print all locations of identifiers:
using IOManip::showpos;
for (auto& id : all_identifiers(parsed)) {
std::cout << " - " << id << " at " << showpos(id, input) << "\n";
}
Of course, this begs the question, what are showpos
and all_identifiers
?
namespace IOManip {
struct showpos_t {
boost::iterator_range<It> fragment;
std::string const& source;
friend std::ostream& operator<<(std::ostream& os, showpos_t const& manip) {
auto ofs = [&](It it) { return it - manip.source.begin(); };
return os << "[" << ofs(manip.fragment.begin()) << ".." << ofs(manip.fragment.end()) << ")";
}
};
showpos_t showpos(boost::iterator_range<It> fragment, std::string const& source) {
return {fragment, source};
}
}
As for the identifier extraction:
std::vector<Identifier> all_identifiers(Expression const& expr) {
std::vector<Identifier> result;
struct Harvest {
using result_type = void;
std::back_insert_iterator<std::vector<Identifier> > out;
void operator()(Identifier const& id) { *out++ = id; }
void operator()(MemberExpression const& me) { apply_visitor(*this, me.object); *out++ = me.member; }
void operator()(FunctionCall const& fc) {
apply_visitor(*this, fc.function);
for (auto& p : fc.parameters) apply_visitor(*this, p);
}
// non-identifier expressions
void operator()(std::string const&) { }
void operator()(double) { }
} harvest { back_inserter(result) };
boost::apply_visitor(harvest, expr);
return result;
}
That's a tree visitor that harvests all identifiers recursively, inserting them into the back of a container.
Live On Coliru
Where output looks like (excerpt):
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
- app at [0..3)
- mainwindow at [4..14)
- find_child at [17..27)
- font at [40..44)
- config at [45..51)
- preferences at [54..65)
- baseFont at [66..74)
- style at [75..80)
- PROPORTIONAL at [81..93)