What is a good complete regular expression or some other process that would take the title:
How do you change a title to be part of the URL like Stack
Assuming that your model class has a title attribute, you can simply override the to_param method within the model, like this:
def to_param
title.downcase.gsub(/ /, '-')
end
This Railscast episode has all the details. You can also ensure that the title only contains valid characters using this:
validates_format_of :title, :with => /^[a-z0-9-]+$/,
:message => 'can only contain letters, numbers and hyphens'
You can use the following helper method. It can convert the Unicode characters.
public static string ConvertTextToSlug(string s)
{
StringBuilder sb = new StringBuilder();
bool wasHyphen = true;
foreach (char c in s)
{
if (char.IsLetterOrDigit(c))
{
sb.Append(char.ToLower(c));
wasHyphen = false;
}
else
if (char.IsWhiteSpace(c) && !wasHyphen)
{
sb.Append('-');
wasHyphen = true;
}
}
// Avoid trailing hyphens
if (wasHyphen && sb.Length > 0)
sb.Length--;
return sb.ToString().Replace("--","-");
}
The stackoverflow solution is great, but modern browser (excluding IE, as usual) now handle nicely utf8 encoding:
So I upgraded the proposed solution:
public static string ToFriendlyUrl(string title, bool useUTF8Encoding = false)
{
...
else if (c >= 128)
{
int prevlen = sb.Length;
if (useUTF8Encoding )
{
sb.Append(HttpUtility.UrlEncode(c.ToString(CultureInfo.InvariantCulture),Encoding.UTF8));
}
else
{
sb.Append(RemapInternationalCharToAscii(c));
}
...
}
Full Code on Pastebin
Edit: Here's the code for RemapInternationalCharToAscii
method (that's missing in the pastebin).
T-SQL implementation, adapted from dbo.UrlEncode:
CREATE FUNCTION dbo.Slug(@string varchar(1024))
RETURNS varchar(3072)
AS
BEGIN
DECLARE @count int, @c char(1), @i int, @slug varchar(3072)
SET @string = replace(lower(ltrim(rtrim(@string))),' ','-')
SET @count = Len(@string)
SET @i = 1
SET @slug = ''
WHILE (@i <= @count)
BEGIN
SET @c = substring(@string, @i, 1)
IF @c LIKE '[a-z0-9--]'
SET @slug = @slug + @c
SET @i = @i +1
END
RETURN @slug
END
You can also use this JavaScript function for in-form generation of the slug's (this one is based on/copied from Django):
function makeSlug(urlString, filter) {
// Changes, e.g., "Petty theft" to "petty_theft".
// Remove all these words from the string before URLifying
if(filter) {
removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
"is", "in", "into", "like", "of", "off", "on", "onto", "per",
"since", "than", "the", "this", "that", "to", "up", "via", "het", "de", "een", "en",
"with"];
}
else {
removelist = [];
}
s = urlString;
r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
s = s.replace(r, '');
s = s.replace(/[^-\w\s]/g, ''); // Remove unneeded characters
s = s.replace(/^\s+|\s+$/g, ''); // Trim leading/trailing spaces
s = s.replace(/[-\s]+/g, '-'); // Convert spaces to hyphens
s = s.toLowerCase(); // Convert to lowercase
return s; // Trim to first num_chars characters
}
No, no, no. You are all so very wrong. Except for the diacritics-fu stuff, you're getting there, but what about Asian characters (shame on Ruby developers for not considering their nihonjin brethren).
Firefox and Safari both display non-ASCII characters in the URL, and frankly they look great. It is nice to support links like 'http://somewhere.com/news/read/お前たちはアホじゃないかい'.
So here's some PHP code that'll do it, but I just wrote it and haven't stress tested it.
<?php
function slug($str)
{
$args = func_get_args();
array_filter($args); //remove blanks
$slug = mb_strtolower(implode('-', $args));
$real_slug = '';
$hyphen = '';
foreach(SU::mb_str_split($slug) as $c)
{
if (strlen($c) > 1 && mb_strlen($c)===1)
{
$real_slug .= $hyphen . $c;
$hyphen = '';
}
else
{
switch($c)
{
case '&':
$hyphen = $real_slug ? '-and-' : '';
break;
case 'a':
case 'b':
case 'c':
case 'd':
case 'e':
case 'f':
case 'g':
case 'h':
case 'i':
case 'j':
case 'k':
case 'l':
case 'm':
case 'n':
case 'o':
case 'p':
case 'q':
case 'r':
case 's':
case 't':
case 'u':
case 'v':
case 'w':
case 'x':
case 'y':
case 'z':
case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
case 'G':
case 'H':
case 'I':
case 'J':
case 'K':
case 'L':
case 'M':
case 'N':
case 'O':
case 'P':
case 'Q':
case 'R':
case 'S':
case 'T':
case 'U':
case 'V':
case 'W':
case 'X':
case 'Y':
case 'Z':
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
$real_slug .= $hyphen . $c;
$hyphen = '';
break;
default:
$hyphen = $hyphen ? $hyphen : ($real_slug ? '-' : '');
}
}
}
return $real_slug;
}
Example:
$str = "~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 コリン ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 トーマス ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 アーノルド ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04";
echo slug($str);
Outputs: コリン-and-トーマス-and-アーノルド
The '-and-' is because &'s get changed to '-and-'.