Regular Expressions for City name

…衆ロ難τιáo~ 提交于 2019-11-29 02:33:58
pcalcao

This can be arbitrarily complex, depending on how precise you need the match to be, and the variation you're willing to allow.

Something fairly simple like ^[a-zA-Z]+(?:[\s-][a-zA-Z]+)*$ should work.

warning: This does not match cities like München, etc, but here you basically need to work with the [a-zA-Z] part of the expression, and define what characters are allowed for your particular case.

Keep in mind that it also allows for something like San----Francisco, or having several spaces.

Translates to something like: 1 or more letters, followed by a block of: 0 or more spaces or dashes and more letters, this last block can occur 0 or more times.

Weird stuff in there: the ?: bit. If you're not familiarized with regexes, it might be confusing, but that simply states that the piece of regex between parenthesis, is not a capturing group (I don't want to capture the part it matches to reuse later), so the parenthesis are only used as to group the expression (and not to capture the match).

"New York" // passes

"San-Francisco" // passes

"San Fran Cisco" // passes (sorry, needed an example with three tokens)

"Chicago" // passes

"  Chicago" // doesn't pass, starts with spaces

"San-" // doesn't pass, ends with a dash

This answer assumes that the letters which @Manaysah refers to also encompasses the use of diacritical marks. I've added the single quote ' since many names in Canada and France have it. I've also added the period (dot) since it's required for contracted names.

Building upon @UIDs answer I came up with,

^([a-zA-Z\u0080-\u024F]+(?:. |-| |'))*[a-zA-Z\u0080-\u024F]*$

The list of cities it accepts:

Toronto
St. Catharines
San Fransisco
Val-d'Or
Presqu'ile
Niagara on the Lake
Niagara-on-the-Lake
München
toronto
toRonTo
villes du Québec
Provence-Alpes-Côte d'Azur
Île-de-France
Kópavogur
Garðabær
Sauðárkrókur
Þorlákshöfn

And what it rejects:

A----B
------
*******
&&
()
//
\\

I didn't add in the use of brackets and other marks since it didn't fall within the scope of this question.

I've stayed away from \s for whitespace. Tabs and line feeds aren't part of a city name and shouldn't be used in my opinion.

Adding my answer if anybody needs its while searching for Regex for City Names, Like I did

Please use this :

^[a-zA-Z\u0080-\u024F\s\/\-\)\(\`\.\"\']+$

As many city names contains dashes, such as Soddy-Daisy, Tennessee, or special characters like, ñ in La Cañada Flintridge, California

Hope this helps!

Here is the one I've found works best

for PCRE flavours allowing \p{L} (.NET, php, Golang)

/^\p{L}+(?:([\ \-\']|(\.\ ))\p{L}+)*$/u

for regex that does not allow \p{L} replace it with [a-zA-Z\u0080-\u024F]

so for javascript, python regex use

/^[a-zA-Z\u0080-\u024F]+(?:([\ \-\']|(\.\ ))[a-zA-Z\u0080-\u024F]+)*$/

White listing a bunch of character is easy, but there are things to watch for in your regex

  • consecutive non-alphabetical characters should not be allowed. i.e. Los Angeles should fail because it has two spaces
  • periods should have a space after. i.e. St.Albert should fail because it's missing the space
  • names cannot start or end with non-alphabetical characters i.e. -Chicago- should fail
  • a whitespace character \s !== \, i.e. a tab and line feed character could pass, so space character should be defined instead

Note: When building regex rules, I find https://regex101.com/tests is very helpful, as you can easily create unit tests

js: https://regex101.com/r/cgJwc0/1/tests
php: https://regex101.com/r/Yo3GV2/1/tests

use this regex:

^[a-zA-Z-\s]+$

After many hours of looking for a city regex matcher I have built this and it meets my needs 100%

(?ix)^[A-Z.-]+(?:\s+[A-Z.-]+)*$

expression for testing city. Matches

  • City
  • St. City
  • Some Silly-City
  • City St.
  • Too Many Words City

it seems that there are many flavors of regex and I built this for my Java needs and it works great

^[a-zA-Z.-]+(?:[\s-][\/a-zA-Z.]+)*$

This will help identify some city names like St. Johns, Baie-Sainte-Anne, Grand-Salut/Grand Falls

I like shepley's suggestion, but it has a couple flaws in it.

If you change shpeley's regex to this, it will not accept other special characters:

^([a-zA-Z\u0080-\u024F]{1}[a-zA-Z\u0080-\u024F\. |\-| |']*[a-zA-Z\u0080-\u024F\.']{1})$

I use that one:

^[a-zA-Z\\u0080-\\u024F.]+((?:[ -.|'])[a-zA-Z\\u0080-\\u024F]+)*$
Nitin Khanna

You can try this:

^\p{L}+(?:[\s\-]\p{L}+)*

The above regex will:

  • Restrict leading and trailing spaces, hyphens
  • Match cities with names like Néewiller-près-lauterbourg

Here's one that will work with most cities, and has been tested:

^[a-zA-Z\u0080-\u024F]+(?:. |-| |')*([1-9a-zA-Z\u0080-\u024F]+(?:. |-| |'))*[a-zA-Z\u0080-\u024F]*$

Python code below, including its test.

import re
import pytest


CITY_RE = re.compile(
    r"^[a-zA-Z\u0080-\u024F]+(?:. |-| |')*"  # a word
    r"([1-9a-zA-Z\u0080-\u024F]+(?:. |-| |'))*"
    r"[a-zA-Z\u0080-\u024F]*$"
)


def is_city(value: str) -> bool:
    valid = CITY_RE.match(value) is not None
    return valid

# Tests
@pytest.mark.parametrize(
    "value,expected",
    (
        ("1", False),
        ("Toronto", True),
        ("Saint-Père-en-Retz", True),
        ("Saint Père en Retz", True),
        ("Saint-Père en Retz", True),
        ("Paris 13e Arrondissement", True),
        ("Paris  13e  Arrondissement ", True),
        ("Bouc-Étourdi", True),
        ("Arnac-la-Poste", True),
        ("Bourré", True),
        ("Å", True),
        ("San Francisco", True),
    ),
)
def test_is_city(value, expected):
    valid, msg = validate.is_city(value)
    assert valid is expected
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!