I am programming a blog and I want the URIs to be the title like the question title here in stackoverflow or like wordpress.
What are the rules for sanitizing a URI?
Here's how drupal does it.
In case of site goes down:
<?php
function pathauto_cleanstring($string)
{
$url = $string;
$url = preg_replace('~[^\\pL0-9_]+~u', '-', $url); // substitutes anything but letters, numbers and '_' with separator
$url = trim($url, "-");
$url = iconv("utf-8", "us-ascii//TRANSLIT", $url); // TRANSLIT does the whole job
$url = strtolower($url);
$url = preg_replace('~[^-a-z0-9_]+~', '', $url); // keep only letters, numbers, '_' and separator
return $url;
}
Many CMS's have implemented something like that, the one of Wordpress has been posted in another question. You might be interested in the question about this technique in general, too.
Generally you'll want your URL to have only 0-9 and a-z, and make sure that everything is lowercase. Replace spaces with dashes (-), and strip the rest of the gibberish.
SO pretty much has it figured out.
This might be the shortest way to replace any non alphanumeric character with a single hyphen:
trim(preg_replace('/[^a-z0-9-]+/', '-', strtolower($str)), '-')