I need to be able to recognise date strings. It doesn\'t matter if I can not distinguish between month and date (e.g. 12/12/10), I just need to classify the string as being
You can loop all available date formats in Java:
for (Locale locale : DateFormat.getAvailableLocales()) {
for (int style = DateFormat.FULL; style <= DateFormat.SHORT; style ++) {
DateFormat df = DateFormat.getDateInstance(style, locale);
try {
df.parse(dateString);
// either return "true", or return the Date obtained Date object
} catch (ParseException ex) {
continue; // unperasable, try the next one
}
}
}
This however won't account for any custom date formats.
You could always check to see if there are two '/' characters in a string.
public static boolean isDate(){
String date = "12/25/2010";
int counter = 0;
for(int i=0; i<date.length(); i++){
if ("\/-.".indexOf(date.charAt(i)) != -1) //Any symbol can be used.
counter++;
}
if(counter == 2) //If there are two symbols in the string,
return true; //Return true.
else
return false;
}
You can do something similar to check to see if everything else is an integer.
Maybe you should use regular expressions?
Hopefully this one would work for mm-dd-yyyy format:
^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d$
Here (0[1-9]|1[012])
matches the month 00..12, (0[1-9]|[12][0-9]|3[01])
matches a date 00..31 and (19|20)\d\d
matches a year.
Fields can be delmited by dash, slash or a dot.
Regards, Serge
What I would do is look for date characteristics, rather than the dates themselves. For example, you could search for slashes, (to get dates of the form 1/1/1001), dashes (1 - 1 - 1001), month names and abbreviations (Jan 1 1001 or January 1 1001). When you get a hit for these, collect the nearby words (2 on each side should be fine) and store that in an array of strings. Once you have scanned all input, check this string array with a function that will go into a bit more depth and pull out actual date strings, using the methods found here. The important thing is just getting the general dates down to a manageable level.
Here is a simple natty example :
import com.joestelmach.natty.*;
List<Date> dates =new Parser().parse("Start date 11/30/2013 , end date Friday, Sept. 7, 2013").get(0).getDates();
System.out.println(dates.get(0));
System.out.println(dates.get(1));
//output:
//Sat Nov 30 11:14:30 BDT 2013
//Sat Sep 07 11:14:30 BDT 2013
I did it with a huge regex (self created):
public static final String DATE_REGEX = "\b([0-9]{1,2} ?([\\-/\\\\] ?[0-9]{1,2} ?| (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ?)([\\-/\\\\]? ?('?[0-9]{2}|[0-9]{4}))?)\b";
public static final Pattern DATE_PATTERN = Pattern.compile(DATE_REGEX, Pattern.CASE_INSENSITIVE); // Case insensitive is to match also "mar" and not only "Mar" for March
public static boolean containsDate(String str)
{
Matcher matcher = pattern.matcher(str);
return matcher.matches();
}
This matches following dates:
06 Sep 2010
12-5-2005
07 Mar 95
30 DEC '99
11\9\2001
And not this:
444/11/11
bla11/11/11
11/11/11blah
It also matches dates between symbols like []
,()
, ,
:
Yesterday (6 nov 2010)
It matches dates without year:
Yesterday, 6 nov, was a rainy day...
But it matches:
86-44/1234
00-00-0000
11\11/11
And this doesn't look not anymore like a date. But this is something you can solve by checking if the numbers are possible values for a month, day, year.