I have a regular expression that Im using in php:
$word_array = preg_split(
\'/(\\/|\\.|-|_|=|\\?|\\&|html|shtml|www|php|cgi|htm|aspx|asp|index|com|net|o
I would think that if you were trying to derive meaning from the URL's that you would actually want to write clean URL's in such a way that you don't need a complex regex to derive the value.
In many cases this involves using server redirect rules and a front controller or request router.
So what you build are clean URL's like
/value1/value2/value3
Without any .html
,.php
, etc. in the URL at all.
It seems to me that you are not addressing the problem at the point of entry into the system (i.e the web server) adequately so as to make your URL parsing as simple as it should be.