How to translate “Lorem 3 ipsum dolor sit amet” into SEO friendly “Lorem-3-ipsum-dolor-sit-amet” in Java?

后端 未结 3 509
广开言路
广开言路 2021-01-23 10:49

In my blog app, a user can enter any text as a title for their entry and then I generate a URL based on the text.

I validate their title to make sure it only co

3条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-23 11:45

    It's in practice really not as simple as replacing spaces by hypens. You would often also like to make it all lowercase and normalize/replace diacritics, like á, ö, è and so on which are invalid URL characters. The only valid characters are listed as "Unreserved characters" in the 2nd table of this Wikipedia page.

    Here's how such a function can look like:

    public static String prettyURL(String string) {
        return Normalizer.normalize(string.toLowerCase(), Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
            .replaceAll("[^\\p{Alnum}]+", "-");
    }
    

    It does basically the following:

    • lowercase the string
    • remove combining diacritical marks (after the Normalizer has "extracted" them from the actual chars)
    • replace non-alphanumeric characters by hyphens

    See also:

    • JSP 2.0 SEO friendly links encoding

提交回复
热议问题