Example 1: domain.com/dir_1/dir_2/dir_3/./../../../
Should resolve naturally in the browser into = domain.com/
Example 2: domain.c
This is a more simple problem then you are thinking about it. All you need to do is explode()
on the /
character, and parse out all of the individual segments using a stack. As you traverse the array from left to right, if you see .
, do nothing. If you see ..
, pop an element from the stack. Otherwise, push an element onto the stack.
$str = 'domain.com/dir_1/dir_2/dir_3/./../../../';
$array = explode( '/', $str);
$domain = array_shift( $array);
$parents = array();
foreach( $array as $dir) {
switch( $dir) {
case '.':
// Don't need to do anything here
break;
case '..':
array_pop( $parents);
break;
default:
$parents[] = $dir;
break;
}
}
echo $domain . '/' . implode( '/', $parents);
This will properly resolve the URLs in all of your test cases.
Note that error checking is left as an exercise to the user (i.e. when the $parents
stack is empty and you try to pop something off of it).
What you want here is a "replaceDots" function.
It works by remembering the position of the last valid item and then if you get dots then removing the item. The full description is here "Remove Dot Segments" http://tools.ietf.org/html/rfc3986. Search for Remove Dot Segments at the RFC page.
You need more than one loop. The inner loop scans ahead and looks at the next part and then if it is dots the current part is skipped etc, but it can be trickier than that. Or consider breaking it up into parts and then following the algorithm.
While the input buffer is not empty, loop as follows:
A. If the input buffer begins with a prefix of "../" or "./", then remove that prefix from the input buffer; otherwise,
B. if the input buffer begins with a prefix of "/./" or "/.", where "." is a complete path segment, then replace that prefix with "/" in the input buffer; otherwise,
C. if the input buffer begins with a prefix of "/../" or "/..", where ".." is a complete path segment, then replace that prefix with "/" in the input buffer and remove the last segment and its preceding "/" (if any) from the output buffer; otherwise,
D. if the input buffer consists only of "." or "..", then remove that from the input buffer; otherwise,
E. move the first path segment in the input buffer to the end of the output buffer, including the initial "/" character (if any) and any subsequent characters up to, but not including, the next "/" character or the end of the input buffer.
It works by remembering the position of the last valid item and then if you get dots then removing the item. The full description is here
HERE IS MY VERSION OF IT IN C++...
ortl_funcimp(len_t) _str_remove_dots(char_t* s, len_t len) {
len_t x,yy;
/*
Modifies the string in place by copying parts back. Not
sure if this is the best way to do it since it involves
many copies for deep relatives like ../../../../../myFile.cpp
For each ../ it does one copy back. If the loop was implemented
using writing into a buffer, you would have to do both, so this
seems to be the best technique.
*/
__checklenx(s,len);
x = 0;
while (x < len) {
if (s[x] == _c('.')) {
x++;
if (x < len) {
if (s[x] == _c('.')) {
x++;
if (x < len) {
if (s[x] == _c('/')) { // ../
mem_move(&s[x],&s[x-2],(len-x)*sizeof(char_t));
len -= 2;
x -= 2;
}
else x++;
}
else len -= 2;// .. only
}
else if (s[x] == _c('/')){ // ./
mem_move(&s[x],&s[x-1],(len-x)*sizeof(char_t));
len--;
x--;
}
}
else --len;// terminating '.', remove
}
else if (s[x] == _c('/')) {
x++;
if (x < len) {
if (s[x] == _c('.')) {
x++;
if (x < len) {
if (s[x] == _c('/')) { // /./
mem_move(&s[x],&s[x-2],(len-x)*sizeof(char_t));
len -= 2;
x -= 2;
}
else if (s[x] == _c('.')) { // /..
x++;
if (x < len) { //
if (s[x] == _c('/')) {// /../
yy = x;
x -= 3;
if (x > 0) x--;
while ((x > 0) && (s[x] != _c('/'))) x--;
mem_move(&s[yy],&s[x],(len-yy) * sizeof(char_t));
len -= (yy - x);
}
else {
x++;
}
}
else {// ends with /..
x -= 3;
if (x > 0) x--;
while (x > 0 && s[x] != _c('/')) x--;
s[x] = _c('/');
x++;
len = x;
}
}
else x++;
}
else len--;// ends with /.
}
else x++;
}
}
else x++;
}
return len;
}