Simple XML parser in bison/flex

ぐ巨炮叔叔 提交于 2020-01-02 06:15:09

问题


I would like to create simple xml parser using bison/flex. I don't need validation, comments, arguments, only <tag>value</tag>, where value can be number, string or other <tag>value</tag>.

So for example:

<div>
  <mul>
    <num>20</num>
    <add>
      <num>1</num>
      <num>5</num>
    </add>
  </mul>
  <id>test</id>
</div>

If it helps, I know the names of all tags that may occur. I know how many sub-tag can be hold by given tag. Is it possible to create bison parser that would do something like that:

- new Tag("num", 1)           // tag1
- new Tag("num", 5)           // tag2
- new Tag("add", tag1, tag2)  // tag3
- new Tag("num", 20)          // tag4
- new Tag("mul", tag4, tag3)
...
- root = top_tag

Tag & number of sub-tags:

  • num: 1 (only value)
  • str: 1 (only value)
  • add | sub | mul | div: 2 (num | str | tag, num | str | tag)

Could you help me with grammar to be able to create AST like given above?


回答1:


For your requirements, I think the yax system would work well. From the README:

The goal of the yax project is to allow the use of YACC (Gnu Bison actually) to parse/process XML documents.

The key piece of software for achieving the above goal is to provide a library that can produce an XML lexical token stream from an XML document.

This stream can be wrapped to create an instance of yylex() to feed tokens to a Bison grammar to parse and process the XML document.

Using the stream plus a Bison grammar, it is possible to carry at least the following kinds of activities.

  1. Validate XML documents,
  2. Directly parse XML documents to create internal data structures,
  3. Construct DOM trees.



回答2:


I do not think that it's the best tool to use to create a xml parser. If I have to do this job, I'll do it by hand.

Flex code will contains : NUM match integer in this example. STR match match any string which does not contains a '<' or '>'. STOP match all closing tags. START match starting tags.

<\?.*\?> { ;} 
<[a-z]+> { return START; }
</[a-z]+> { return STOP; }
[0-9]+ { return NUM; }
[^><]+ { return STR; }

Bison code will look like

%token START, STOP, STR, NUM
%%
simple_xml : START value STOP
;
value : simple_xml 
| STR
| NUM
| value simple_xml
;


来源:https://stackoverflow.com/questions/3121917/simple-xml-parser-in-bison-flex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!