Boost Spirit: parse boolean expression and reduce to canonical normal form

眉间皱痕 提交于 2021-02-19 08:21:21

问题


I want to parse a common Boolean with just or, and and not operators, which I think I have done using Boost Spirit below. In phase 2 (or perhaps part of the parsing itself), I wish to transform the AST of the Boolean to disjunctive canonical normal form, which essentially "flattens" the expression and removes all grouping operators.

In one of my attempts, I created the Boost static_visitor below, named Transformer. I started by trying to eliminate double not operators by just assigning a child node to it's grandparent if the child and the parent are both not operators. My problem is referring to the parent of the current node. It seems like there is no way to refer to the current node's parent, because once a node is visited, the visit function overloads on the inner type of the 'variant' thus discarding the variant nature of the object. Any help appreciated.

struct op_or  {};
struct op_and {};
struct op_not {};

typedef std::string var;
template <typename tag> struct binop;
template <typename tag> struct uniop;

typedef boost::variant
    <
        var,
        boost::recursive_wrapper<uniop<op_not>>,
        boost::recursive_wrapper<binop<op_and>>,
        boost::recursive_wrapper<binop<op_or>>
    >
    expr;

template <typename tag> struct uniop
{
    explicit uniop(expr const& o) : exp_u(o) { }
    expr exp_u;
};

template <typename tag> struct binop
{
    explicit binop(expr const& l, expr const& r) : exp_l(l), exp_r(r) { }
    expr exp_l, exp_r;
};

struct transformer : boost::static_visitor<void>
{
    std::deque<std::reference_wrapper<expr>> stk;

    transformer(expr & e)
    {
        stk.push_back(e);
    }

    void operator()(var const& v) const { }

    void operator()(uniop<op_not> & u)
    {
        if (boost::get<uniop<op_not>>(&stk.back().get()) != nullptr)
        {
            stk.back() = u.exp_u;
        }
        else
        {
            stk.push_back(std::ref(u));  // <<=== Fails with "no matching function for call"
            boost::apply_visitor(*this, u.exp_u);
            stk.pop_back();
        }
    }
    void operator()(binop<op_and> & b)
    {
        stk.push_back(std::ref(u));
        boost::apply_visitor(*this, b.exp_l);
        boost::apply_visitor(*this, b.exp_r);
        stk.pop_back();
    }
    void operator()(binop<op_or> & b)
    {
        stk.push_back(std::ref(u));
        boost::apply_visitor(*this, b.exp_l);
        boost::apply_visitor(*this, b.exp_r);
        stk.pop_back();
    }
};

template <typename It, typename Skipper = boost::spirit::qi::space_type>
struct parser : boost::spirit::qi::grammar<It, expr(), Skipper>
{
    parser() : parser::base_type(expr_)
    {
        using namespace boost::phoenix;
        using namespace boost::spirit::qi;

        using boost::spirit::qi::_1;

        expr_  = or_.alias();

        or_  = and_ [ _val = _1 ] >> *("or" >> and_ [ _val = construct<binop<op_or>>(_val, _1) ]);
        and_ = not_ [ _val = _1 ] >> *("and" >> not_ [ _val = construct<binop<op_and>>(_val, _1) ]);
        not_ = "not" > simple [ _val = construct<uniop<op_not>>(_1) ] | simple [ _val = _1 ];

        simple =  '(' > expr_ > ')' | var_;
        var_ = lexeme[ +alpha ];
    }

private:
    boost::spirit::qi::rule<It, var() , Skipper> var_;
    boost::spirit::qi::rule<It, expr(), Skipper> not_, and_, or_, simple, expr_;
};

回答1:


It appears that the conversion to DCNF is NP-complete. Therefore you can expect to make concessions.

Your highly simplified subtask just eliminates double negations. It looks like you were trying to keep a stack of parent expression references (stk) but:

  1. you do not explicitly show a way to extract or return the simplified expression (the original expression would be unaltered)
  2. you try to push a uniop<> node as a reference to an expr node which is a type mismatch:

    stk.push_back(std::ref(u));  // <<=== Fails with "no matching function for call"
    

    To me this is a just another symptom of the fact that

    transformer(expr & e)        {
        stk.push_back(e);
    }
    

    fails to recurse into the sub-expressions. If it did, you could trust that the surrounding expr& would already be on the stack. The same goes for the binop/unop handlers, which both attempt to push references to u which doesn't even exist in scope at the time, and likely were meant to push the current node, which runs into the same kind of type mismatch.

First: simplify

I think I'ts much easier to write these in functional style: instead of "manipulating" an object graph, let the transformation return the transformed result.

This at once means you can leave all node types untouched, unless yours it is a nested negation. Here's how it looks:

struct simplify {
    typedef expr result_type;

    // in general, just identity transform
    template <typename E> auto operator()(E const& e) const { return e; }

    // only handle these:
    auto operator()(expr const& e) const { return apply_visitor(*this, e); }
    expr operator()(unop<op_not> const& e) const {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            return nested_negation->exp_u;
        }
        return e;
    }
};

A simple test program exercising it would be:

Live On Coliru

std::vector<expr> tests {
    "a",
    NOT{"a"},
    AND{"a", "b"},
    OR{"a","b"},
    AND{NOT{"a"},NOT{"b"}},
    NOT{{NOT{"a"}}},
};

const simplifier simplify{};

for (expr const& expr : tests) {
    std::cout << std::setw(30) << str(expr) << " -> " << simplify(expr) << "\n";
}

Printing:

                       "a" -> "a"
                  NOT{"a"} -> NOT{"a"}
              AND{"a","b"} -> AND{"a","b"}
               OR{"a","b"} -> OR{"a","b"}
    AND{NOT{"a"},NOT{"b"}} -> AND{NOT{"a"},NOT{"b"}}
             NOT{NOT{"a"}} -> "a"

Using The Stack / Mutating

The analogous using a stack would **seem* similarly easy:

HERE BE DRAGONS

struct stack_simplifier {
    typedef void result_type;
    std::deque<std::reference_wrapper<expr>> stk;

    void operator()(expr& e) {
        stk.push_back(e);
        apply_visitor(*this, e);
        stk.pop_back();
    }

    template <typename Other>
    void operator()(Other&) {}

    void operator()(unop<op_not>& e) {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            stk.back().get() = nested_negation->exp_u;
        }
    }
};

The usage would no longer be const (because the functions are impure) and so is the expr argument (which will be mutated):

for (expr expr : tests) {
    std::cout << std::setw(30) << str(expr);

    stack_simplifier{}(expr);
    std::cout << " -> " << expr << "\n";
}

It /does/ seem to work (Live On Coliru), but there are visible downsides:

  • the stack serves no real purpose, only the top element is ever inspected (you could replace it with a pointer to the current expression node)
  • the functor object is non-pure/non-const
  • the expression tree is being mutated while traversing. This is just a timebomb ticking for you to invoke Undefined Behaviour: in

    void operator()(unop<op_not>& e) {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            stk.back().get() = nested_negation->exp_u;
        }
    }
    

    after the assignment to the expression on top of the stack, the reference to e is dangling. So is nested_negation. Dereferencing either beyond that point is UB.

  • Now in this simple scenario (collapsing double negations) it doesn't seem too hard to mentally check that this is actually ok. WRONG

    It turns out that that operator= on a variant calls variant_assign, which looks like this:

    void variant_assign(const variant& rhs)
    {
        // If the contained types are EXACTLY the same...
        if (which_ == rhs.which_)
        {
            // ...then assign rhs's storage to lhs's content:
            detail::variant::assign_storage visitor(rhs.storage_.address());
            this->internal_apply_visitor(visitor);
        }
        else
        {
            // Otherwise, perform general (copy-based) variant assignment:
            assigner visitor(*this, rhs.which());
            rhs.internal_apply_visitor(visitor); 
        }
    }
    

    The assigner visitor has a deadly detail (selected one of the nothrow-aware overloads):

    template <typename RhsT, typename B1, typename B2>
    void assign_impl(
          const RhsT& rhs_content
        , mpl::true_ // has_nothrow_copy
        , B1 // is_nothrow_move_constructible
        , B2 // has_fallback_type
        ) const BOOST_NOEXCEPT
    {
        // Destroy lhs's content...
        lhs_.destroy_content(); // nothrow
    
        // ...copy rhs content into lhs's storage...
        new(lhs_.storage_.address())
            RhsT( rhs_content ); // nothrow
    
        // ...and indicate new content type:
        lhs_.indicate_which(rhs_which_); // nothrow
    }
    

    OOPS It turns out that left-handside is destroyed first. However in

        stk.back().get() = nested_negation->exp_u;
    

    the right-hand side is a sub-object of the left-hand side (!!!). The unintuitive way to avoid UB here is take a temporary copy¹:

        expr tmp = nested_negation->exp_u;
        stk.back().get() = tmp;
    
  • Imagine you were applying a transformation like De-Morgan's law. What if there was (also) a nested-negation involved in a sub-expression?

It seems to me that the mutating approach is simply unnecessarily error-prone.

Recursive, Immutable Transformation a.k.a. Joy

There's another problem with the approaches until now. Nested sub-expressions are not transformed here. E.g.

  NOT{NOT{AND{"a",NOT{NOT{"b"}}}}} -> AND{"a",NOT{NOT{"b"}}}

Instead of the desired AND{"a","b"}. This is easily fixed in the pure-functional appraoch:

struct simplifier {
    typedef expr result_type;

    template <typename T> auto operator()(T const& v) const { return call(v); }

  private:
    auto call(var const& e) const { return e; }
    auto call(expr const& e) const {
        auto s = apply_visitor(*this, e);
        return s;
    }
    expr call(unop<op_not> const& e) const {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            return call(nested_negation->exp_u);
        }

        return unop<op_not> {call(e.exp_u)};
    }
    template <typename Op> auto call(binop<Op> const& e) const {
        return binop<Op> {call(e.exp_l), call(e.exp_r)};
    }
};

Everything is still immutable, but we handle all types of expressions to recurse their sub-expressions. Now it prints:

Live On Coliru

                               "a" -> "a"
                          NOT{"a"} -> NOT{"a"}
                      AND{"a","b"} -> AND{"a","b"}
                       OR{"a","b"} -> OR{"a","b"}
            AND{NOT{"a"},NOT{"b"}} -> AND{NOT{"a"},NOT{"b"}}
                     NOT{NOT{"a"}} -> "a"
  NOT{NOT{AND{"a",NOT{NOT{"b"}}}}} -> AND{"a","b"}

For completeness, a similar transoformation to the "stack_simplifier": http://coliru.stacked-crooked.com/a/cc5627aa37f0c969


¹ actually move semantics might be used, but I'm ignoring for clarity



来源:https://stackoverflow.com/questions/60387155/boost-spirit-parse-boolean-expression-and-reduce-to-canonical-normal-form

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!