Looking for a regex/replace function to take a user inputted string say, \"John Smith\'s Cool Page\" and return a filename/url safe string like \"john_smith_s_cool_page.html\",
I know the original poster asked for a simple Regular Expression, however, there is more involved in sanitizing filenames, including filename length, reserved filenames, and, of course reserved characters.
Take a look at the code in node-sanitize-filename for a more robust solution.
I think your requirement is to replaces white spaces and aphostophy `s with _ and append the .html at the end try to find such regex.
refer
http://www.regular-expressions.info/javascriptexample.html
Well, here's one that replaces anything that's not a letter or a number, and makes it all lower case, like your example.
var s = "John Smith's Cool Page";
var filename = s.replace(/[^a-z0-9]/gi, '_').toLowerCase();
Explanation:
The regular expression is /[^a-z0-9]/gi
. Well, actually the gi
at the end is just a set of options that are used when the expression is used.
i
means "ignore upper/lower case differences"g
means "global", which really means that every match should be replaced, not just the first one.So what we're looking as is really just [^a-z0-9]
. Let's read it step-by-step:
[
and ]
define a "character class", which is a list of single-characters. If you'd write [one]
, then that would match either 'o' or 'n' or 'e'.^
at the start of the list of characters. That means it should match only characters not in the list.a-z0-9
. Read this as "a through z and 0 through 9". It's is a short way of writing abcdefghijklmnopqrstuvwxyz0123456789
.So basically, what the regular expression says is: "Find every letter that is not between 'a' and 'z' or between '0' and '9'".
For more flexible and robust handling of unicode characters etc, you could use the slugify in conjunction with some regex to remove unsafe URL characters
const urlSafeFilename = slugify(filename, { remove: /"<>#%\{\}\|\\\^~\[\]`;\?:@=&/g });
This produces nice kebab-case filenemas in your url and allows for more characters outside the a-z0-9
range.