String text = \'[[\"item1\",\"item2\",\"item3\"], [\"some\", \"item\"], [\"far\", \"out\", \"string\"]]\';
I would like to iterate over each individual
This syntax looks like a subset of JSON, and I would guess that the client side is actually encoding it as JSON. Assuming that is true, the simplest approach will be to use an off-the-shelf JSON parser, and some simple Java code to convert the resulting objects into the form that your code requires.
Sure, you could implement your own parser by hand, but it is probably not worth the effort, especially if you have to deal with string escaping, possible variability in whitespaces and so on. Don't forget that if you implement your own parser, you NEED TO IMPLEMENT UNIT TESTS to make sure that it works across the full range of expected valid input, and for invalid input as well. (Testing the cases of invalid input is important because you don't want your server to fall over if some hacker sends requests containing bad input.)
Before you go any further, you really need to confirm the exact syntax that the client is sending you. Just looking at an example is not going to answer that. You either need a document specifying what the syntax is, or you need to look at the client / application source code.
Since you are using a string that looks like JSON, I would just use a JSON parser. One of the simplest to uses is gson. Here is an example using gson:
String text = '[["item1","item2","item3"], ["some", "item"], ["far", "out", "string"]]';
GSON gson = new GSON();
ArrayList<ArrayList<String>> list = gson.fromJson(text, new TypeToken<ArrayList<ArrayList<String>>>() {}.getType());
Here is the gson site: http://code.google.com/p/google-gson/
Here's a simple parser, it should deal with all kinds of abusive nesting and will be robust to single and double quotes -- but it won't care if you mix them 'test"
is treated equivalent to "test"
.
edit: added comments, and now it deals with escaped quotes in strings. (and now improved string token handling even more)
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
public class StringToList {
public static void main(String[] args) throws IOException{
StringReader sr = new StringReader("[[\"it\\\"em1\", \"item2\",\"item3\"], [\"some\",\"item\"], [\"far\",\"out\",\"string\"]]");
System.out.println(tokenize(sr));
}
@SuppressWarnings({ "rawtypes", "unchecked" })
public static List tokenize(StringReader in) throws IOException{
List stack = new ArrayList<Object>();
int c;
while((c = in.read()) != -1){
switch(c){
case '[':
// found a nested structure, recurse..
stack.add(tokenize(in));
break;
case ']':
// found the end of this run, return the
// current stack
return stack;
case '"':
case '\'':
// get the next full string token
stack.add(stringToken(in));
break;
}
}
// we artificially start with a list, though in principle I'm
// defining the string to hold only a single list, so this
// gets rid of the one I created artifically.
return (List)stack.get(0);
}
public static String stringToken(StringReader in) throws IOException{
StringBuilder str = new StringBuilder();
boolean escaped = false;
int c;
outer: while((c = in.read()) != -1){
switch(c){
case '\\':
escaped = true;
break;
case '"':
case '\'':
if(escaped){
escaped = false;
}else{
break outer;
}
default:
str.append((char)c);
}
}
return str.toString();
}
}
Just a couple of notes: this won't enforce your syntax to be correct, so if you do something goofy with the quotes, like I described, it might still get parsed as (un)expected. Also, I don't enforce commas at al, you don't even need a space between the quotes, so ["item1""item2"]
is just as valid using this parser as ["item1", "item2"]
, but perhaps more oddly, this thing should also deal with ["item1"asdf"item2"]
ignoring asdf
.
You need to build a parser by hand. It's not hard, but it will take up time. In the previous comment you said you want an ArrayList of ArrayList... hmmm... good
Just parse the string char by char and recognize each token by first defining recursive parsing rules. Recursive descendant parser rules are usually graphical, but I can try to use ABNF for you
LIST = NIL / LIST_ITEM *( ',' SP LIST_ITEM)
LIST_ITEM = NIL / '[' STRING_ITEM *(, SP STRING ITEM) ']'
STRING_ITEM = '"' ANYCHAR '"'
SP = space
ANYCHAR = you know, anything that is not double quotes
NIL = ''
Another approach is to use Regular Expressions. Here are a couple of samples. First capture outer elements by
(\[[^\]]*\])
The above regex capture everything from '[' to the first ']', but you need to modify it or cut the brackets from your string (just drop first and last char)
Then capture inner elements by
(\"[^\"]\")
Simple as the above