Determine if a String is a valid date before parsing

后端 未结 12 1607
半阙折子戏
半阙折子戏 2020-12-03 11:36

I have this situation where I am reading about 130K records containing dates stored as String fields. Some records contain blanks (nulls), some contain strings like this: \'

相关标签:
12条回答
  • 2020-12-03 11:52

    See Lazy Error Handling in Java for an overview of how to eliminate try/catch blocks using an Option type.

    Functional Java is your friend.

    In essence, what you want to do is to wrap the date parsing in a function that doesn't throw anything, but indicates in its return type whether parsing was successful or not. For example:

    import fj.F; import fj.F2;
    import fj.data.Option;
    import java.text.SimpleDateFormat;
    import java.text.ParseException;
    import static fj.Function.curry;
    import static fj.Option.some;
    import static fj.Option.none;
    ...
    
    F<String, F<String, Option<Date>>> parseDate =
      curry(new F2<String, String, Option<Date>>() {
        public Option<Date> f(String pattern, String s) {
          try {
            return some(new SimpleDateFormat(pattern).parse(s));
          }
          catch (ParseException e) {
            return none();
          }
        }
      });
    

    OK, now you've a reusable date parser that doesn't throw anything, but indicates failure by returning a value of type Option.None. Here's how you use it:

    import fj.data.List;
    import static fj.data.Stream.stream;
    import static fj.data.Option.isSome_;
    ....
    public Option<Date> parseWithPatterns(String s, Stream<String> patterns) { 
      return stream(s).apply(patterns.map(parseDate)).find(isSome_()); 
    }
    

    That will give you the date parsed with the first pattern that matches, or a value of type Option.None, which is type-safe whereas null isn't.

    If you're wondering what Stream is... it's a lazy list. This ensures that you ignore patterns after the first successful one. No need to do too much work.

    Call your function like this:

    for (Date d: parseWithPatterns(someString, stream("dd/MM/yyyy", "dd-MM-yyyy")) {
      // Do something with the date here.
    }
    

    Or...

    Option<Date> d = parseWithPatterns(someString,
                                       stream("dd/MM/yyyy", "dd-MM-yyyy"));
    if (d.isNone()) {
      // Handle the case where neither pattern matches.
    } 
    else {
      // Do something with d.some()
    }
    
    0 讨论(0)
  • 2020-12-03 11:52

    Use regular expressions to parse your string. Make sure that you keep both regex's pre-compiled (not create new on every method call, but store them as constants), and compare if it actually is faster then the try-catch you use.

    I still find it strange that your method returns null if both versions fail rather then throwing an exception.

    0 讨论(0)
  • 2020-12-03 11:57

    On one hand I see nothing wrong with your use of try/catch for the purpose, it’s the option I would use. On the other hand there are alternatives:

    1. Take a taste from the string before deciding how to parse it.
    2. Use optional parts of the format pattern string.

    For my demonstrations I am using java.time, the modern Java date and time API, because the Date class used in the question was always poorly designed and is now long outdated. For a date without time of day we need a java.time.LocalDate.

    try-catch

    Using try-catch with java.time looks like this:

        DateTimeFormatter ddmmmuuFormatter = DateTimeFormatter.ofPattern("dd-MMM-uu", Locale.ENGLISH);
        DateTimeFormatter ddmmuuuuFormatter = DateTimeFormatter.ofPattern("dd/MM/uuuu");
    
        String dateString = "07-Jun-09";
    
        LocalDate result;
        try {
            result = LocalDate.parse(dateString, ddmmmuuFormatter);
        } catch (DateTimeParseException dtpe) {
            result = LocalDate.parse(dateString, ddmmuuuuFormatter);
        }
        System.out.println("Date: " + result);
    

    Output is:

    Date: 2009-06-07

    Suppose instead we defined the string as:

        String dateString = "07/06/2009";
    

    Then output is still the same.

    Take a taste

    If you prefer to avoid the try-catch construct, it’s easy to make a simple check to decide which of the formats your string conforms to. For example:

        if (dateString.contains("-")) {
            result = LocalDate.parse(dateString, ddmmmuuFormatter);
        } else {
            result = LocalDate.parse(dateString, ddmmuuuuFormatter);
        }
    

    The result is the same as before.

    Use optional parts in the format pattern string

    This is the option I like the least, but it’s short and presented for some measure of completeness.

        DateTimeFormatter dateFormatter
                = DateTimeFormatter.ofPattern("[dd-MMM-uu][dd/MM/uuuu]", Locale.ENGLISH);
        LocalDate result = LocalDate.parse(dateString, dateFormatter);
    

    The square brackets denote optional parts of the format. So Java first tries to parse using dd-MMM-uu. No matter if successful or not it then tries to parse the remainder of the string using dd/MM/uuuu. Given your two formats one of the attempts will succeed, and you have parsed the date. The result is still the same as above.

    Link

    Oracle tutorial: Date Time explaining how to use java.time.

    0 讨论(0)
  • 2020-12-03 11:58

    You can take advantage of regular expressions to determine which format the string is in, and whether it matches any valid format. Something like this (not tested):

    (Oops, I wrote this in C# before checking to see what language you were using.)

    Regex test = new Regex(@"^(?:(?<formatA>\d{2}-[a-zA-Z]{3}-\d{2})|(?<formatB>\d{2}/\d{2}/\d{3}))$", RegexOption.Compiled);
    Match match = test.Match(yourString);
    if (match.Success)
    {
        if (!string.IsNullOrEmpty(match.Groups["formatA"]))
        {
            // Use format A.
        }
        else if (!string.IsNullOrEmpty(match.Groups["formatB"]))
        {
            // Use format B.
        }
        ...
    }
    
    0 讨论(0)
  • 2020-12-03 12:00

    Don't be too hard on yourself about using try-catch in logic: this is one of those situations where Java forces you to so there's not a lot you can do about it.

    But in this case you could instead use DateFormat.parse(String, ParsePosition).

    0 讨论(0)
  • 2020-12-03 12:00

    Looks like three options if you only have two, known formats:

    • check for the presence of - or / first and start with that parsing for that format.
    • check the length since "dd-MMM-yy" and "dd/MM/yyyy" are different
    • use precompiled regular expressions

    The latter seems unnecessary.

    0 讨论(0)
提交回复
热议问题