I want to parse a String which has UUID in the below format
\"<urn:uuid:4324e9d5-8d1f-442c-96a4-6146640da7ce>\"
I have tried
If this format don't be changed. I think more fast way is use String.substring() method. Example:
String val = "<urn:uuid:4324e9d5-8d1f-442c-96a4-6146640da7ce>";
String sUuid = val.substring(13, 49);
UUID uuid = UUID.fromString(sUuid);
Inside class String used char array for store data, in package java.lang.String:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
...
113: /** The value is used for character storage. */
114: private final char value[];
...
}
Method 'String substring(int beginIndex, int endIndex)' make the copy of array elements, from start to end index, and create new String on basis new array. Copying of array it is a very fast operation.
Your example of a faster regex is using a <
where the input is <
so that's confusing.
Regarding speed, first, your UUID is hexadecimal, so don't match with A-Z
but rather a-f
. Second you give no indication that case is mixed, so don't use case insensitive and write the correct case in the range.
You don't explain if you need the part preceding the UUID. If not, don't include .*?
, and you may as well write the literals for re1
and re2
together in your final Pattern
. There's no indication you need DOTALL either.
private static final Pattern splitter =
Pattern.compile("([a-f0-9]{8}(-[a-f0-9]{4}){4}[a-f0-9]{8})");
Alternatively, if you are measuring your Regular Expression's performance to be too slow, you might try another approach, for example:
Is each uuid preceded by "uuid:" as in your example? If so you can
Along similar lines your faster regex could be:
private static final Pattern URN_UUID_PATTERN =
Pattern.compile("^<urn:uuid:(.{36})>");
OTOH if all your input strings are going to start with those exact characters, no need to do step 1 in the previous suggestion, just input.substring(13, 49);