How does Stack Overflow generate its SEO-friendly URLs?

后端 未结 21 1824
-上瘾入骨i
-上瘾入骨i 2020-11-22 04:27

What is a good complete regular expression or some other process that would take the title:

How do you change a title to be part of the URL like Stack

相关标签:
21条回答
  • 2020-11-22 05:00

    Assuming that your model class has a title attribute, you can simply override the to_param method within the model, like this:

    def to_param
      title.downcase.gsub(/ /, '-')
    end
    

    This Railscast episode has all the details. You can also ensure that the title only contains valid characters using this:

    validates_format_of :title, :with => /^[a-z0-9-]+$/,
                        :message => 'can only contain letters, numbers and hyphens'
    
    0 讨论(0)
  • 2020-11-22 05:00

    You can use the following helper method. It can convert the Unicode characters.

    public static string ConvertTextToSlug(string s)
    {
        StringBuilder sb = new StringBuilder();
    
        bool wasHyphen = true;
    
        foreach (char c in s)
        {
            if (char.IsLetterOrDigit(c))
            {
                sb.Append(char.ToLower(c));
                wasHyphen = false;
            }
            else
                if (char.IsWhiteSpace(c) && !wasHyphen)
                {
                    sb.Append('-');
                    wasHyphen = true;
                }
        }
    
        // Avoid trailing hyphens
        if (wasHyphen && sb.Length > 0)
            sb.Length--;
    
        return sb.ToString().Replace("--","-");
    }
    
    0 讨论(0)
  • 2020-11-22 05:02

    The stackoverflow solution is great, but modern browser (excluding IE, as usual) now handle nicely utf8 encoding:

    enter image description here

    So I upgraded the proposed solution:

    public static string ToFriendlyUrl(string title, bool useUTF8Encoding = false)
    {
        ...
    
            else if (c >= 128)
            {
                int prevlen = sb.Length;
                if (useUTF8Encoding )
                {
                    sb.Append(HttpUtility.UrlEncode(c.ToString(CultureInfo.InvariantCulture),Encoding.UTF8));
                }
                else
                {
                    sb.Append(RemapInternationalCharToAscii(c));
                }
        ...
    }
    

    Full Code on Pastebin

    Edit: Here's the code for RemapInternationalCharToAscii method (that's missing in the pastebin).

    0 讨论(0)
  • 2020-11-22 05:03

    T-SQL implementation, adapted from dbo.UrlEncode:

    CREATE FUNCTION dbo.Slug(@string varchar(1024))
    RETURNS varchar(3072)
    AS
    BEGIN
        DECLARE @count int, @c char(1), @i int, @slug varchar(3072)
    
        SET @string = replace(lower(ltrim(rtrim(@string))),' ','-')
    
        SET @count = Len(@string)
        SET @i = 1
        SET @slug = ''
    
        WHILE (@i <= @count)
        BEGIN
            SET @c = substring(@string, @i, 1)
    
            IF @c LIKE '[a-z0-9--]'
                SET @slug = @slug + @c
    
            SET @i = @i +1
        END
    
        RETURN @slug
    END
    
    0 讨论(0)
  • 2020-11-22 05:05

    You can also use this JavaScript function for in-form generation of the slug's (this one is based on/copied from Django):

    function makeSlug(urlString, filter) {
        // Changes, e.g., "Petty theft" to "petty_theft".
        // Remove all these words from the string before URLifying
    
        if(filter) {
            removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
            "is", "in", "into", "like", "of", "off", "on", "onto", "per",
            "since", "than", "the", "this", "that", "to", "up", "via", "het", "de", "een", "en",
            "with"];
        }
        else {
            removelist = [];
        }
        s = urlString;
        r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
        s = s.replace(r, '');
        s = s.replace(/[^-\w\s]/g, ''); // Remove unneeded characters
        s = s.replace(/^\s+|\s+$/g, ''); // Trim leading/trailing spaces
        s = s.replace(/[-\s]+/g, '-'); // Convert spaces to hyphens
        s = s.toLowerCase(); // Convert to lowercase
        return s; // Trim to first num_chars characters
    }
    
    0 讨论(0)
  • 2020-11-22 05:08

    No, no, no. You are all so very wrong. Except for the diacritics-fu stuff, you're getting there, but what about Asian characters (shame on Ruby developers for not considering their nihonjin brethren).

    Firefox and Safari both display non-ASCII characters in the URL, and frankly they look great. It is nice to support links like 'http://somewhere.com/news/read/お前たちはアホじゃないかい'.

    So here's some PHP code that'll do it, but I just wrote it and haven't stress tested it.

    <?php
        function slug($str)
        {
            $args = func_get_args();
            array_filter($args);  //remove blanks
            $slug = mb_strtolower(implode('-', $args));
    
            $real_slug = '';
            $hyphen = '';
            foreach(SU::mb_str_split($slug) as $c)
            {
                if (strlen($c) > 1 && mb_strlen($c)===1)
                {
                    $real_slug .= $hyphen . $c;
                    $hyphen = '';
                }
                else
                {
                    switch($c)
                    {
                        case '&':
                            $hyphen = $real_slug ? '-and-' : '';
                            break;
                        case 'a':
                        case 'b':
                        case 'c':
                        case 'd':
                        case 'e':
                        case 'f':
                        case 'g':
                        case 'h':
                        case 'i':
                        case 'j':
                        case 'k':
                        case 'l':
                        case 'm':
                        case 'n':
                        case 'o':
                        case 'p':
                        case 'q':
                        case 'r':
                        case 's':
                        case 't':
                        case 'u':
                        case 'v':
                        case 'w':
                        case 'x':
                        case 'y':
                        case 'z':
    
                        case 'A':
                        case 'B':
                        case 'C':
                        case 'D':
                        case 'E':
                        case 'F':
                        case 'G':
                        case 'H':
                        case 'I':
                        case 'J':
                        case 'K':
                        case 'L':
                        case 'M':
                        case 'N':
                        case 'O':
                        case 'P':
                        case 'Q':
                        case 'R':
                        case 'S':
                        case 'T':
                        case 'U':
                        case 'V':
                        case 'W':
                        case 'X':
                        case 'Y':
                        case 'Z':
    
                        case '0':
                        case '1':
                        case '2':
                        case '3':
                        case '4':
                        case '5':
                        case '6':
                        case '7':
                        case '8':
                        case '9':
                            $real_slug .= $hyphen . $c;
                            $hyphen = '';
                            break;
    
                        default:
                           $hyphen = $hyphen ? $hyphen : ($real_slug ? '-' : '');
                    }
                }
            }
            return $real_slug;
        }
    

    Example:

    $str = "~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 コリン ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 トーマス ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 アーノルド ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04";
    echo slug($str);
    

    Outputs: コリン-and-トーマス-and-アーノルド

    The '-and-' is because &'s get changed to '-and-'.

    0 讨论(0)
提交回复
热议问题