Parse string into a tree structure?

独自空忆成欢 提交于 2019-12-05 06:42:10

Don't use regular expressions for this task. An easier method would be to describe your string with a grammar (BNF or EBNF) and then write a parser to parse the string according to the grammar. You can generate a parse-tree from your EBNF and BNF and so you naturally end up with a tree structure.

You can start with something like this:

element      ::= element-type, { ["|"], element-type }
element-type ::= primitive | "{", element, "}"
primitive    ::= symbol | word
symbol       ::= "." | "!"
word         ::= character { character }
character    ::= "a" | "b" | ... | "z"

Note: I wrote this up quickly, and so it may not be completely correct. But it should give you an idea.

Trying to match the whole thing with a single regular expression isn't going to get you too far, since regular expressions output at most a list of matching substring positions, nothing tree-like. You want a lexer or grammar which does something like this:

Divide the input into tokens - atomic pieces like '{', '|', and 'world', then process those tokens in order. Start with an empty tree with a single root node.

Every time you find {, create and go to a child node.

Every time you find |, create and go to a sibling node.

Every time you find }, go up to the parent node.

Every time you find a word, put that word in the current leaf node.

if you want a quick hack:

  • replace the { chars with [
  • replace the } chars with ]
  • replace the | chars with spaces
  • hope you dont get input with spaces.

read it in so it comes up as nested arrays.

ps: I agree that a reg-ex can't do this.

pss: set * read-eval * to false (you don't want the input running it's self)

You can use amotoen to build grammar and parse this:

(ns pegg.core
  (:gen-class)
  (:use
   (com.lithinos.amotoen
    core string-wrapper))
  (:use clojure.contrib.pprint))

(def input "{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}")

(def grammar
     {
      :Start :List
      :ws #"^[ \n\r\t]*"
      :Sep "|"
      :String #"^[A-Za-z !.]+"
      :Item '(| :String :List)
      :Items [:Item '(+ [:Sep :Item])]
      :List [:ws "{" '(* (| :Items :Item)) "}" :ws]
      })

(def parser (create-parser grammar))

(defn parse
  [^String input]
  (validate grammar)
  (pprint (parser (wrap-string input))))

Result:

pegg.core> (parse input)
{:List [{:ws ""} "{" ({:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "Hello big"}} ([{:Sep "|"} {:Item {:String "Hi"}}] [{:Sep "|"} {:Item {:String "Hey"}}])]}) "}" {:ws " "}]}} {:Items [{:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "world"}} ([{:Sep "|"} {:Item {:String "earth"}}])]}) "}" {:ws ""}]}} ([{:Sep "|"} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "Goodbye"}} ([{:Sep "|"} {:Item {:String "farewell"}}])]}) "}" {:ws " "}]}}])]} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "planet"}} ([{:Sep "|"} {:Item {:String "rock"}}] [{:Sep "|"} {:Item {:String "globe"}}])]} {:Item {:List [{:ws ""} "{" ({:Items [{:Item {:String "."}} ([{:Sep "|"} {:Item {:String "!"}}])]}) "}" {:ws ""}]}}) "}" {:ws ""}]}}) "}" {:ws ""}]}

P.S. This is one of my first peg grammar and it can be better. Also see http://en.wikipedia.org/wiki/Parsing_expression_grammar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!