How does Stack Overflow generate its SEO-friendly URLs?

后端 未结 21 1839
-上瘾入骨i
-上瘾入骨i 2020-11-22 04:27

What is a good complete regular expression or some other process that would take the title:

How do you change a title to be part of the URL like Stack

相关标签:
21条回答
  • 2020-11-22 04:43

    Rewrite of Jeff's code to be more concise

        public static string RemapInternationalCharToAscii(char c)
        {
            var s = c.ToString().ToLowerInvariant();
    
            var mappings = new Dictionary<string, string>
            {
                { "a", "àåáâäãåą" },
                { "c", "çćčĉ" },
                { "d", "đ" },
                { "e", "èéêëę" },
                { "g", "ğĝ" },
                { "h", "ĥ" },
                { "i", "ìíîïı" },
                { "j", "ĵ" },
                { "l", "ł" },
                { "n", "ñń" },
                { "o", "òóôõöøőð" },
                { "r", "ř" },
                { "s", "śşšŝ" },
                { "ss", "ß" },
                { "th", "Þ" },
                { "u", "ùúûüŭů" },
                { "y", "ýÿ" },
                { "z", "żźž" }
            };
    
            foreach(var mapping in mappings)
            {
                if (mapping.Value.Contains(s))
                    return mapping.Key;
            }
    
            return string.Empty;
        }
    
    0 讨论(0)
  • 2020-11-22 04:48

    I don't much about Ruby or Rails, but in Perl, this is what I would do:

    my $title = "How do you change a title to be part of the url like Stackoverflow?";
    
    my $url = lc $title;   # Change to lower case and copy to URL.
    $url =~ s/^\s+//g;     # Remove leading spaces.
    $url =~ s/\s+$//g;     # Remove trailing spaces.
    $url =~ s/\s+/\-/g;    # Change one or more spaces to single hyphen.
    $url =~ s/[^\w\-]//g;  # Remove any non-word characters.
    
    print "$title\n$url\n";
    

    I just did a quick test and it seems to work. Hopefully this is relatively easy to translate to Ruby.

    0 讨论(0)
  • 2020-11-22 04:48

    Brian's code, in Ruby:

    title.downcase.strip.gsub(/\ /, '-').gsub(/[^\w\-]/, '')
    

    downcase turns the string to lowercase, strip removes leading and trailing whitespace, the first gsub call globally substitutes spaces with dashes, and the second removes everything that isn't a letter or a dash.

    0 讨论(0)
  • 2020-11-22 04:49

    I am not familiar with Ruby on Rails, but the following is (untested) PHP code. You can probably translate this very quickly to Ruby on Rails if you find it useful.

    $sURL = "This is a title to convert to URL-format. It has 1 number in it!";
    // To lower-case
    $sURL = strtolower($sURL);
    
    // Replace all non-word characters with spaces
    $sURL = preg_replace("/\W+/", " ", $sURL);
    
    // Remove trailing spaces (so we won't end with a separator)
    $sURL = trim($sURL);
    
    // Replace spaces with separators (hyphens)
    $sURL = str_replace(" ", "-", $sURL);
    
    echo $sURL;
    // outputs: this-is-a-title-to-convert-to-url-format-it-has-1-number-in-it
    

    I hope this helps.

    0 讨论(0)
  • 2020-11-22 04:51

    If you are using Rails edge, you can rely on Inflector.parametrize - here's the example from the documentation:

      class Person
        def to_param
          "#{id}-#{name.parameterize}"
        end
      end
    
      @person = Person.find(1)
      # => #<Person id: 1, name: "Donald E. Knuth">
    
      <%= link_to(@person.name, person_path(@person)) %>
      # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
    

    Also if you need to handle more exotic characters such as accents (éphémère) in previous version of Rails, you can use a mixture of PermalinkFu and DiacriticsFu:

    DiacriticsFu::escape("éphémère")
    => "ephemere"
    
    DiacriticsFu::escape("räksmörgås")
    => "raksmorgas"
    
    0 讨论(0)
  • 2020-11-22 04:53

    Here's how we do it. Note that there are probably more edge conditions than you realize at first glance.

    This is the second version, unrolled for 5x more performance (and yes, I benchmarked it). I figured I'd optimize it because this function can be called hundreds of times per page.

    /// <summary>
    /// Produces optional, URL-friendly version of a title, "like-this-one". 
    /// hand-tuned for speed, reflects performance refactoring contributed
    /// by John Gietzen (user otac0n) 
    /// </summary>
    public static string URLFriendly(string title)
    {
        if (title == null) return "";
    
        const int maxlen = 80;
        int len = title.Length;
        bool prevdash = false;
        var sb = new StringBuilder(len);
        char c;
    
        for (int i = 0; i < len; i++)
        {
            c = title[i];
            if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
            {
                sb.Append(c);
                prevdash = false;
            }
            else if (c >= 'A' && c <= 'Z')
            {
                // tricky way to convert to lowercase
                sb.Append((char)(c | 32));
                prevdash = false;
            }
            else if (c == ' ' || c == ',' || c == '.' || c == '/' || 
                c == '\\' || c == '-' || c == '_' || c == '=')
            {
                if (!prevdash && sb.Length > 0)
                {
                    sb.Append('-');
                    prevdash = true;
                }
            }
            else if ((int)c >= 128)
            {
                int prevlen = sb.Length;
                sb.Append(RemapInternationalCharToAscii(c));
                if (prevlen != sb.Length) prevdash = false;
            }
            if (i == maxlen) break;
        }
    
        if (prevdash)
            return sb.ToString().Substring(0, sb.Length - 1);
        else
            return sb.ToString();
    }
    

    To see the previous version of the code this replaced (but is functionally equivalent to, and 5x faster), view revision history of this post (click the date link).

    Also, the RemapInternationalCharToAscii method source code can be found here.

    0 讨论(0)
提交回复
热议问题