Extracting dates from text in Java

后端 未结 5 1841
旧时难觅i
旧时难觅i 2021-01-13 17:20

Is it possible to extract dates from a string in Java?

I have 500+ string with different data. In them, there can be:
\"... period from 08.23.2011 - 09.05.2011..

相关标签:
5条回答
  • 2021-01-13 18:10

    I would use a simple regex to get "likely" dates out first, and then parse them more carefully (ideally with Joda Time, IMO). I'd start off with a regex of \b\d{2}\.\d{2}\.\d{4}\b (plus escaping for the Java string of course).

    (The \b bit matches a word boundary, so 12345.45.12345 won't match.)

    You can make your regex more selective, of course, but it would be very hard to make it do all the validation required (imagine trying to encode all the rules for leap years in a regex) - so if you're going to need to validate as you parse anyway, there's not a lot of point in making the regex complicated.

    0 讨论(0)
  • 2021-01-13 18:13

    A date pattern recognition algorithm to not only identify date pattern but also fetches probable date in Java date format. This algorithm is very fast and lightweight. The processing time is linear and all dates are identified in a single pass. Algorithm resolves date using tree traverse mechanism. Tree data structures are custom created to build supported date, time and month patterns.

    The algorithm also acknowledges multiple space characters in between Date literals. E.g. DD DD DD and DD DD DD are considered as valid dates.

    Following date-patterns are considered as valid and are identifiable using this algorithm.

    dd MM(MM) yy(yy) yy(yy) MM(MM) dd MM(MM) dd yy(yy)

    Where M is month literal is alphabet format like Jan or January

    Allowed delimiters between dates are '/', '\', ' ', ',', '|', '-', ' '

    It also recognizes trailing time pattern in following format hh(24):mm:ss.SSS am / pm hh(24):mm:ss am / pm hh(24):mm:ss am / pm

    Resolution time is linear, no pattern matching or brute force is used. This algorithm is based on tree traversal and returns back, the list of date with following three components - date string identified in the text - converted & formatted date string - SimpleDateFormat

    Using date string and the format string, users are free to convert the string into objects based on their requirements.

    The algorithm library is available at maven central.

    <dependency>
        <groupId>net.rationalminds</groupId>
        <artifactId>DateParser</artifactId>
        <version>0.3.0</version>
    </dependency>
    

    The sample code to use this is below.

     import java.util.List;  
     import net.rationalminds.LocalDateModel;  
     import net.rationalminds.Parser;  
     public class Test {  
       public static void main(String[] args) throws Exception {  
            Parser parser=new Parser();  
            List<LocalDateModel> dates=parser.parse("Identified date :'2015-January-10 18:00:01.704', converted");  
            System.out.println(dates);  
       }  
     }  
    

    Output: [LocalDateModel{originalText=2015-january-10 18:00:01.704, dateTimeString=2015-1-10 18:00:01.704, conDateFormat=yyyy-MM-dd HH:mm:ss.SSS, start=18, end=46}]

    Detailed blog at http://coffeefromme.blogspot.com/2015/10/how-to-extract-date-object-from-given.html

    The complete source is available on GitHub at https://github.com/vbhavsingh/DateParser

    0 讨论(0)
  • 2021-01-13 18:14

    You can extract them with regex first: \d{2}\.\d{2}\.\d{4} and then parse each match with SimpleDateFormat - new SimpleDateFormat("dd.MM.yyyy").parse(dateString)

    0 讨论(0)
  • 2021-01-13 18:15

    In essence regex is the answer for recognition, but there are lots and lots of ways to express dates and time periods, so if you want a good solution, you probably want to use an existing well-tuned set of regex. There's then a second phase of interpretation, which needs more flexibility than what JodaTime will parse out of the box. So for a robust solution, you probably want to use one of the systems that have been built in the natural language processing community, such as SUTime, HeidelTime or GUTime.

    0 讨论(0)
  • 2021-01-13 18:17

    You mean String and not text (this is Java)

    Create a String Object to represent the text and then parse it into a newDateFormat class:

    SimpleDateFormat = new SimpleDateFormat("dd.MM.yyyy").parse(yourString)
    
    0 讨论(0)
提交回复
热议问题