This is pseudo homework (it's extra credit). I've got a BST which is an index of words that point to the lines (stored somewhere else) that contain the words. I need to implement a way to search using s-expressions so I can combine and (&) and or (|).
At the command prompt a user could type something like:
QUERY ((((fire)&(forest))|((ocean)&(boat)))&(water))
Essentially that should return all lines that contain the words fire, forest and water as well as all lines that contain ocean, boat and water.
What I really need help with is the logic for parsing and inserting nodes into the tree to properly represent the expression more than the actual code. The only thing I have worked out that makes sense to me is returning a set of lines for each word in the expression. Then depending on if it's an "or" or "and" operation I would perform a union or intersection type operation on those sets to create a new set and pass that on up the tree.
I am kind of lost on how to parse the line that contains the expression. After some thought it appears that the "farther" out one of the sub-expressions is the higher it should be in my s-expression tree? I think if I could just get a push in the right direction as far as parsing and inserting the expressions in the tree I should be OK.
My sample tree that I came up with for the query above looks something like;
&
/ \
| water
/ \
& &
/ \ / \
fire forest ocean boat
This makes sense as fire would return a set of lines that all contain fire and forest would return a set of lines that all contain forest. Then at the "&" level I would take those two sets and create another set that contained only the lines that were in both sets thus giving me a set that only has lines which contain both fire and forest.
My other stumbling block is how to represent everything in the tree after I overcome the hurdle of parsing. I have an ExpTreeNode class that will serve as the nodes for my ExpTree(the BST) and then I have 2 subclasses, operator and operand, but I'm not sure if this is a good approach.
Dijkstra has done it for you already :-)
Try the shunting yard algorithm: http://en.wikipedia.org/wiki/Shunting-yard_algorithm
You can create the RPN (reverse polish notation) using the shunting yard algorithm, and once that is created, you can make a pass through it to create the binary tree.
Normally, the RPN is used to do the evaluation, but you can actually create a tree.
For instance, instead of evaluating, you create tree nodes and push them onto the stack.
So if you see node1, node2 , operator. You create a new node
Operator
/ \
node1 node2
and push it back onto the stack.
A more detailed example:
Say the expression is (apples AND oranges) OR kiwis
THe RPN for this is kiwis oranges apples AND OR
Now walk this while maintaining a stack.
Make a node out of kiwis push onto stack. Node out of oranges push onto stack. Same with apples.
So The stack is
Node:Apples
Node:Oranges
Node:Kiwis
Now you see the AND in the RPN.
You pop the top two from the stack and create a new Node with AND as parent.
Node:AND, [Node:Apples, Node:Oranges]
basically the tree
AND
/ \
Apples Oranges
Now push this node onto stack.
So stack is
Node:AND, [Node:Apples, Node:Oranges]
Node:Kiwis
Now you see the OR in the RPN and create a node with OR as parent and Node:ANd and Node Kiwis as children getting the tree
OR
/ \
AND Kiwis
/ \
Apples Oranges
You might even be able to modify the shunting yard algorithm to create the tree, but dealing with the RPN seems easier.
Alternately, you can try using Recursive Descent Parsing techniques. What you ask is very common and you will be able to find grammars and code even, if you search the web.
By the way, you just mean Binary tree right? BST (Binary Search Tree) has an extra constraint...
来源:https://stackoverflow.com/questions/5652983/parsing-and-building-s-expressions-using-sets-and-binary-search-tree