How can tokenize this string in java?

后端 未结 9 1078
你的背包
你的背包 2021-01-14 17:32

How can I split these simple mathematical expressions into seperate strings?

I know that I basically want to use the regular expression: \"[0-9]+|[*+-^()]\"

相关标签:
9条回答
  • 2021-01-14 17:37

    Here's a short Java program that tokenizes such strings. If you're looking for evaluation of expression I can (shamelessly) point you at this post: An Arithemetic Expressions Solver in 64 Lines

      import java.util.ArrayList;
      import java.util.List;
    
      public class Tokenizer {
         private String input;
    
         public Tokenizer(String input_) { input = input_.trim(); }
    
         private char peek(int i) {
            return i >= input.length() ? '\0' : input.charAt(i);
         }
    
         private String consume(String... arr) {
            for(String s : arr)
               if(input.startsWith(s))
                  return consume(s.length());
            return null;
         }
    
         private String consume(int numChars) {
            String result = input.substring(0, numChars);
            input = input.substring(numChars).trim();
            return result;
         }
    
         private String literal() {
            for (int i = 0; true; ++i)
               if (!Character.isDigit(peek(i)))
                  return consume(i);
         }
    
         public List<String> tokenize() {
            List<String> res = new ArrayList<String>();
            if(input.isEmpty())
               return res;
    
            while(true) {
               res.add(literal());
               if(input.isEmpty())
                  return res;
    
               String s = consume("+", "-", "/", "*", "^");
               if(s == null)
                  throw new RuntimeException("Syntax error " + input);
               res.add(s);
            }
         }
    
         public static void main(String[] args) {
            Tokenizer t = new Tokenizer("578+223-5^2");
            System.out.println(t.tokenize());
         }   
      }
    
    0 讨论(0)
  • 2021-01-14 17:38

    You have to escape the "()" in Java, and the '-'

    myString.split("[0-9]+|[\\*\\+\\-^\\(\\)]");

    0 讨论(0)
  • 2021-01-14 17:41

    Here is my tokenizer solution that allows for negative numbers (unary).

    So far it has been doing everything I needed it to:

    private static List<String> tokenize(String expression)
        {
            char c;
            List<String> tokens = new ArrayList<String>();
            String previousToken = null;
            int i = 0;
            while(i < expression.length())
            {
                c = expression.charAt(i);
                StringBuilder currentToken = new StringBuilder();
    
                if (c == ' ' || c == '\t') // Matched Whitespace - Skip Whitespace
                {
                    i++;
                }
                else if (c == '-' && (previousToken == null || isOperator(previousToken)) && 
                        ((i+1) < expression.length() && Character.isDigit(expression.charAt((i+1))))) // Matched Negative Number - Add token to list
                {
                    currentToken.append(expression.charAt(i));
                    i++;
                    while(i < expression.length() && Character.isDigit(expression.charAt(i)))
                    {
                        currentToken.append(expression.charAt(i));
                        i++;
                    }   
                }
                else if (Character.isDigit(c)) // Matched Number - Add to token list
                {
                    while(i < expression.length() && Character.isDigit(expression.charAt(i)))
                    {
                        currentToken.append(expression.charAt(i));
                        i++;
                    }
                }
                else if (c == '+' || c == '*' || c == '/' || c == '^' || c == '-') // Matched Operator - Add to token list
                {
                    currentToken.append(c);
                    i++;
                }
                else // No Match - Invalid Token!
                {
                    i++;
                }
    
                if (currentToken.length() > 0)
                {
                    tokens.add(currentToken.toString());    
                    previousToken = currentToken.toString();    
                }
            }   
            return tokens;
        }
    
    0 讨论(0)
  • 2021-01-14 17:44

    You need to escape the -. I believe the quantifiers (+ and *) lose their special meaning, as do parentheses in a character class. If it doesn't work, try escaping those as well.

    0 讨论(0)
  • 2021-01-14 17:46

    You can't use String.split() for that, since whatever characters match the specified pattern are removed from the output.

    If you're willing to require spaces between the tokens, you can do...

    "578 + 223 - 5 ^ 2 ".split(" ");
    

    which yields...

    578
    +
    223
    -
    5
    ^
    2
    
    0 讨论(0)
  • 2021-01-14 17:52

    This works for the sample string you posted:

    String s = "578+223-5^2";
    String[] tokens = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
    

    The regex is made up entirely of lookaheads and lookbehinds; it matches a position (not a character, but a "gap" between characters), that is either preceded by a digit and followed by a non-digit, or preceded by a non-digit and followed by a digit.

    Be aware that regexes are not well suited to the task of parsing math expressions. In particular, regexes can't easily handle balanced delimiters like parentheses, especially if they can be nested. (Some regex flavors have extensions which make that sort of thing easier, but not Java's.)

    Beyond this point, you'll want to process the string using more mundane methods like charAt() and substring() and Integer.parseInt(). Or, if this isn't a learning exercise, use an existing math expression parsing library.

    EDIT: ...or eval() it as @Syzygy recommends.

    0 讨论(0)
提交回复
热议问题