Advanced tokenizer for a complex math expression

孤街醉人 提交于 2020-01-13 19:05:47

问题


I would like to tokenize a string that consists of integers,floats, operators, functions, variables and parentheses. The following example should brighten the essence of problem:

Current state:

String infix = 4*x+5.2024*(Log(x,y)^z)-300.12

Desired state:

 String tokBuf[0]=4 
 String tokBuf[1]=* 
 String tokBuf[2]=x 
 String tokBuf[3]=+ 
 String tokBuf[4]=5.2024 
 String tokBuf[5]=* 
 String tokBuf[6]=( 
 String tokBuf[7]=Log
 String tokBuf[8]=( 
 String tokBuf[9]=x
 String tokBuf[10]=, 
 String tokBuf[11]=y 
 String tokBuf[12]=) 
 String tokBuf[13]=^ 
 String tokBuf[14]=z 
 String tokBuf[15]=) 
 String tokBuf[16]=- 
 String tokBuf[17]=300.12

All tips and solutions would be appreciated.


回答1:


Use the Java stream tokenizer. The interface is a bit strange but one gets used to it:

http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html

Example code that parses to the requested String list (you probably want to use the tokenizer directly or at least use an Object list so you can store numbers directly as Double):

public static List<String> tokenize(String s) throws IOException {
  StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));
  tokenizer.ordinaryChar('-');  // Don't parse minus as part of numbers.
  tokenizer.ordinaryChar('/');  // Don't treat slash as a comment start.
  List<String> tokBuf = new ArrayList<String>();
  while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
    switch(tokenizer.ttype) {
      case StreamTokenizer.TT_NUMBER:
        tokBuf.add(String.valueOf(tokenizer.nval));
        break;
      case StreamTokenizer.TT_WORD:
        tokBuf.add(tokenizer.sval);
        break;
      default:  // operator
        tokBuf.add(String.valueOf((char) tokenizer.ttype));
    }
  }
  return tokBuf; 
}

Test run:

System.out.println(tokenize("4*x+5.2024*(Log(x,y)^z)-300.12"));
[4.0, *, x, +, 5.2024, *, (, Log, (, x, ,, y, ), ^, z, ), -, 300.12]



回答2:


http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form
http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
Example of algorithm:
step#1: read '4' => numeric token => read chars until reach non-num symbol(that is ' * '). The first just read, tokBuf[0] is a numeric token.
step#2: read '*' => token represents a binary operator.
step#3: read 'x'. Perhaps, ot a function symbol => mark the next token as var-token.
And so on.
The next step is evaluation, I guess? Reverse Polish notation or syntax trees will help...



来源:https://stackoverflow.com/questions/16498649/advanced-tokenizer-for-a-complex-math-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!