问题
Yup, you read that right. I needs something that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.
Just a an example: that library would be capable of taking '[ab]*c
' as input, and generate samples such as:
abc
abbbc
bac
etc.
Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.
回答1:
I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)
This is the way you use it:
String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
回答2:
I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:
Write a parser for regular expressions (you may want to start out with a restricted class of regexes).
Use the result to construct an NFA.
(Optional) Convert the NFA to a DFA.
Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.
The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.
回答3:
Here's a few implementations of such a beast, but none of them in Java (and all but the closed-source Microsoft one very limited in their regexp feature support).
回答4:
based on Wilfred Springer's solution together with http://www.brics.dk/~amoeller/automaton/ i build another generator. It do not use recursion. It take as input the patter/regularExpression minimum String length and maximum String length. The result is an accepted String between min and max length. It also allow some of the XML "short hand character classes". I use this for an XML Sample Generator that build valid String for facets.
public static final String generate(final String pattern, final int minLength, final int maxLength) {
final String regex = pattern
.replace("\\d", "[0-9]") // Used d=Digit
.replace("\\w", "[A-Za-z0-9_]") // Used d=Word
.replace("\\s", "[ \t\r\n]"); // Used s="White"Space
final Automaton automaton = new RegExp(regex).toAutomaton();
final Random random = new Random(System.nanoTime());
final List<String> validLength = new LinkedList<>();
int len = 0;
final StringBuilder builder = new StringBuilder();
State state = automaton.getInitialState();
Transition[] transitions;
while(len <= maxLength && (transitions = state.getSortedTransitionArray(true)).length != 0) {
final int option = random.nextInt(transitions.length);
if (state.isAccept() && len >= minLength && len <= maxLength) validLength.add(builder.toString());
final Transition t = transitions[option]; // random transition
builder.append((char) (t.getMin()+random.nextInt(t.getMax()-t.getMin()+1))); len ++;
state = t.getDest();
}
if(validLength.size() == 0) throw new IllegalArgumentException(automaton.toString()+" , "+minLength+" , "+maxLength);
return validLength.get(random.nextInt(validLength.size()));
}
回答5:
Here is a Python implementation of a module like that: http://www.mail-archive.com/python-list@python.org/msg125198.html It should be portable to Java.
来源:https://stackoverflow.com/questions/1578789/how-do-i-generate-text-matching-a-regular-expression-from-a-regular-expression