Regex for splitting a german address into its parts

前端 未结 6 748
孤街浪徒
孤街浪徒 2021-02-09 18:09

Good evening,

I\'m trying to splitting the parts of a german address string into its parts via Java. Does anyone know a regex or a library to do this? To split it like t

6条回答
  •  囚心锁ツ
    2021-02-09 18:46

    I’d start from the back since, as far as I know, a city name cannot contain numbers (but it can contain spaces (first example I’ve found: “Weil der Stadt”). Then the five-digit number before that must be the zip code.

    The number (possibly followed by a single letter) before that is the street number. Note that this can also be a range. Anything before that is the street name.

    Anyway, here we go:

    ^((?:\p{L}| |\d|\.|-)+?) (\d+(?: ?- ?\d+)? *[a-zA-Z]?) (\d{5}) ((?:\p{L}| |-)+)(?: *\(([^\)]+)\))?$
    

    This correctly parses even arcane addresses such as “Straße des 17. Juni 23-25 a 12345 Berlin-Mitte”.

    Note that this doesn’t work with address extensions (such as “Gartenhaus” or “c/o …”). I have no clue how to handle those. I rather doubt that there’s a viable regular expression to express all this.

    As you can see, this is a quite complex regular expression with lots of capture groups. If I would use such an expression in code, I would use named captures (Java 7 supports them) and break the expression up into smaller morsels using the x flag. Unfortunately, Java doesn’t support this. This s*cks because it effectively renders complex regular expressions unusable.

    Still, here’s a somewhat more legible regular expression:

    ^
    (?(?:\p{L}|\ |\d|\.|-)+?)\ 
    (?\d+(?:\ ?-\ ?\d+)?\ *[a-zA-Z]?)\ 
    (?\d{5})\ 
    (?(?:\p{L}|\ |-)+)
    (?:\ *\((?[^\)]+)\))?
    $
    

    In Java 7, the closest we can achieve is this (untested; may contain typos):

    String pattern =
        "^" +
        "(?(?:\\p{L}| |\\d|\\.|-)+?) " +
        "(?\\d+(?: ?- ?\\d+)? *[a-zA-Z]?) " +
        "(?\\d{5}) " +
        "(?(?:\\p{L}| |-)+)" +
        "(?: *\\((?[^\\)]+)\\))?" +
        "$";
    

提交回复
热议问题