Advanced tokenizer for a complex math expression

[亡魂溺海] 提交于 2019-12-06 00:52:30

Use the Java stream tokenizer. The interface is a bit strange but one gets used to it:

http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

Example code that parses to the requested String list (you probably want to use the tokenizer directly or at least use an Object list so you can store numbers directly as Double):

public static List<String> tokenize(String s) throws IOException {
  StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));
  tokenizer.ordinaryChar('-');  // Don't parse minus as part of numbers.
  tokenizer.ordinaryChar('/');  // Don't treat slash as a comment start.
  List<String> tokBuf = new ArrayList<String>();
  while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
    switch(tokenizer.ttype) {
      case StreamTokenizer.TT_NUMBER:
        tokBuf.add(String.valueOf(tokenizer.nval));
        break;
      case StreamTokenizer.TT_WORD:
        tokBuf.add(tokenizer.sval);
        break;
      default:  // operator
        tokBuf.add(String.valueOf((char) tokenizer.ttype));
    }
  }
  return tokBuf; 
}

Test run:

System.out.println(tokenize("4*x+5.2024*(Log(x,y)^z)-300.12"));
[4.0, *, x, +, 5.2024, *, (, Log, (, x, ,, y, ), ^, z, ), -, 300.12]

http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form
http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
Example of algorithm:
step#1: read '4' => numeric token => read chars until reach non-num symbol(that is ' * '). The first just read, tokBuf[0] is a numeric token.
step#2: read '*' => token represents a binary operator.
step#3: read 'x'. Perhaps, ot a function symbol => mark the next token as var-token.
And so on.
The next step is evaluation, I guess? Reverse Polish notation or syntax trees will help...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!