I\'m trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups.
so...
/
Most languages have path parsing functions that will give you this already. If you have the ability, I'd recommend using what comes to you for free out-of-the-box.
Assuming / is the path delimiter...
^(.*/)([^/]*)$
The first group will be whatever the directory/path info is, the second will be the filename. For example:
A very late answer, but hope this will help
^(.+?)/([\w]+\.log)$
This uses lazy check for /
, and I just modified the accepted answer
http://regex101.com/r/gV2xB7/1
I did a little research through trial and error method. Found out that all the values that are available in keyboard are eligible to be a file or directory except '/' in *nux machine.
I used touch command to create file for following characters and it created a file.
(Comma separated values below)
'!', '@', '#', '$', "'", '%', '^', '&', '*', '(', ')', ' ', '"', '\', '-', ',', '[', ']', '{', '}', '`', '~', '>', '<', '=', '+', ';', ':', '|'
It failed only when I tried creating '/' (because it's root directory) and filename container /
because it file separator.
And it changed the modified time of current dir .
when I did touch .
. However, file.log is possible.
And of course, a-z
, A-Z
, 0-9
, -
(hypen), _
(underscore) should work.
So, by the above reasoning we know that a file name or directory name can contain anything except /
forward slash. So, our regex will be derived by what will not be present in the file name/directory name.
/(?:(?P<dir>(?:[/]?)(?:[^\/]+/)+)(?P<filename>[^/]+))/
root
directoryA directory can start with /
when it is absolute path and directory name when it's relative. Hence, look for /
with zero or one occurrence.
/(?P<filepath>(?P<root>[/]?)(?P<rest_of_the_path>.+))/
Next, a directory and its child is always separated by /
. And a directory name can be anything except /
. Let's match /var/ first then.
/(?P<filepath>(?P<first_directory>(?P<root>[/]?)[^\/]+/)(?P<rest_of_the_path>.+))/
Next, let's match all directories
/(?P<filepath>(?P<dir>(?P<root>[/]?)(?P<single_dir>[^\/]+/)+)(?P<rest_of_the_path>.+))/
Here, single_dir is yz/
because, first it matched var/
, then it found next occurrence of same pattern i.e. log/
, then it found the next occurrence of same pattern yz/
. So, it showed the last occurrence of pattern.
Now, we know that we're never going to use the groups like single_dir, filepath, root. Hence let's clean that up.
Let's keep them as groups however don't capture those groups.
And rest_of_the_path is just the filename! So, rename it. And a file will not have /
in its name, so it's better to keep [^/]
/(?:(?P<dir>(?:[/]?)(?:[^\/]+/)+)(?P<filename>[^/]+))/
This brings us to the final result. Of course, there are several other ways you can do it. I am just mentioning one of the ways here.
^
means string starts with
(?P<dir>pattern)
means capture group by group name. We have two groups with group name dir
and file
(?:pattern)
means don't consider this group or non-capturing group.
?
means match zero or one.
+
means match one or more
[^\/]
means matches any char except forward slash (/
)
[/]?
means if it is absolute path then it can start with / otherwise it won't. So, match zero or one occurrence of /
.
[^\/]+/
means one or more characters which aren't forward slash (/
) which is followed by a forward slash (/
). This will match var/
or xyz/
. One directory at a time.
Try this:
^(.+)\/([^/]+)$
EDIT: escaped the forward slash to prevent problems when copy/pasting the Regex
In languages that support regular expressions with non-capturing groups:
((?:[^/]*/)*)(.*)
I'll explain the gnarly regex by exploding it...
(
(?:
[^/]*
/
)
*
)
(.*)
What the parts mean:
( -- capture group 1 starts
(?: -- non-capturing group starts
[^/]* -- greedily match as many non-directory separators as possible
/ -- match a single directory-separator character
) -- non-capturing group ends
* -- repeat the non-capturing group zero-or-more times
) -- capture group 1 ends
(.*) -- capture all remaining characters in group 2
To test the regular expression, I used the following Perl script...
#!/usr/bin/perl -w
use strict;
use warnings;
sub test {
my $str = shift;
my $testname = shift;
$str =~ m#((?:[^/]*/)*)(.*)#;
print "$str -- $testname\n";
print " 1: $1\n";
print " 2: $2\n\n";
}
test('/var/log/xyz/10032008.log', 'absolute path');
test('var/log/xyz/10032008.log', 'relative path');
test('10032008.log', 'filename-only');
test('/10032008.log', 'file directly under root');
The output of the script...
/var/log/xyz/10032008.log -- absolute path
1: /var/log/xyz/
2: 10032008.log
var/log/xyz/10032008.log -- relative path
1: var/log/xyz/
2: 10032008.log
10032008.log -- filename-only
1:
2: 10032008.log
/10032008.log -- file directly under root
1: /
2: 10032008.log
What about this?
[/]{0,1}([^/]+[/])*([^/]*)
Deterministic :
((/)|())([^/]+/)*([^/]*)
Strict :
^[/]{0,1}([^/]+[/])*([^/]*)$
^((/)|())([^/]+/)*([^/]*)$