Regex for parsing directory and filename

前端 未结 8 1506
南方客
南方客 2020-11-30 03:55

I\'m trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups.

so...

/         


        
相关标签:
8条回答
  • 2020-11-30 04:22

    Most languages have path parsing functions that will give you this already. If you have the ability, I'd recommend using what comes to you for free out-of-the-box.

    Assuming / is the path delimiter...

    ^(.*/)([^/]*)$
    

    The first group will be whatever the directory/path info is, the second will be the filename. For example:

    • /foo/bar/baz.log: "/foo/bar/" is the path, "baz.log" is the file
    • foo/bar.log: "foo/" is the path, "bar.log" is the file
    • /foo/bar: "/foo/" is the path, "bar" is the file
    • /foo/bar/: "/foo/bar/" is the path and there is no file.
    0 讨论(0)
  • 2020-11-30 04:22

    A very late answer, but hope this will help

    ^(.+?)/([\w]+\.log)$
    

    This uses lazy check for /, and I just modified the accepted answer

    http://regex101.com/r/gV2xB7/1

    0 讨论(0)
  • 2020-11-30 04:24

    Reasoning:

    I did a little research through trial and error method. Found out that all the values that are available in keyboard are eligible to be a file or directory except '/' in *nux machine.

    I used touch command to create file for following characters and it created a file.

    (Comma separated values below)
    '!', '@', '#', '$', "'", '%', '^', '&', '*', '(', ')', ' ', '"', '\', '-', ',', '[', ']', '{', '}', '`', '~', '>', '<', '=', '+', ';', ':', '|'

    It failed only when I tried creating '/' (because it's root directory) and filename container / because it file separator.

    And it changed the modified time of current dir . when I did touch .. However, file.log is possible.

    And of course, a-z, A-Z, 0-9, - (hypen), _ (underscore) should work.

    Outcome

    So, by the above reasoning we know that a file name or directory name can contain anything except / forward slash. So, our regex will be derived by what will not be present in the file name/directory name.

    /(?:(?P<dir>(?:[/]?)(?:[^\/]+/)+)(?P<filename>[^/]+))/
    

    Step by Step regexp creation process

    Pattern Explanation

    Step-1: Start with matching root directory

    A directory can start with / when it is absolute path and directory name when it's relative. Hence, look for / with zero or one occurrence.

    /(?P<filepath>(?P<root>[/]?)(?P<rest_of_the_path>.+))/
    

    Step-2: Try to find the first directory.

    Next, a directory and its child is always separated by /. And a directory name can be anything except /. Let's match /var/ first then.

    /(?P<filepath>(?P<first_directory>(?P<root>[/]?)[^\/]+/)(?P<rest_of_the_path>.+))/
    

    Step-3: Get full directory path for the file

    Next, let's match all directories

    /(?P<filepath>(?P<dir>(?P<root>[/]?)(?P<single_dir>[^\/]+/)+)(?P<rest_of_the_path>.+))/
    

    Here, single_dir is yz/ because, first it matched var/, then it found next occurrence of same pattern i.e. log/, then it found the next occurrence of same pattern yz/. So, it showed the last occurrence of pattern.

    Step-4: Match filename and clean up

    Now, we know that we're never going to use the groups like single_dir, filepath, root. Hence let's clean that up.

    Let's keep them as groups however don't capture those groups.

    And rest_of_the_path is just the filename! So, rename it. And a file will not have / in its name, so it's better to keep [^/]

    /(?:(?P<dir>(?:[/]?)(?:[^\/]+/)+)(?P<filename>[^/]+))/
    

    This brings us to the final result. Of course, there are several other ways you can do it. I am just mentioning one of the ways here.

    Regex Rules used above are listed here

    ^ means string starts with
    (?P<dir>pattern) means capture group by group name. We have two groups with group name dir and file
    (?:pattern) means don't consider this group or non-capturing group.
    ? means match zero or one. + means match one or more [^\/] means matches any char except forward slash (/)

    [/]? means if it is absolute path then it can start with / otherwise it won't. So, match zero or one occurrence of /.

    [^\/]+/ means one or more characters which aren't forward slash (/) which is followed by a forward slash (/). This will match var/ or xyz/. One directory at a time.

    0 讨论(0)
  • 2020-11-30 04:35

    Try this:

    ^(.+)\/([^/]+)$
    

    EDIT: escaped the forward slash to prevent problems when copy/pasting the Regex

    0 讨论(0)
  • 2020-11-30 04:37

    In languages that support regular expressions with non-capturing groups:

    ((?:[^/]*/)*)(.*)
    

    I'll explain the gnarly regex by exploding it...

    (
      (?:
        [^/]*
        /
      )
      *
    )
    (.*)
    

    What the parts mean:

    (  -- capture group 1 starts
      (?:  -- non-capturing group starts
        [^/]*  -- greedily match as many non-directory separators as possible
        /  -- match a single directory-separator character
      )  -- non-capturing group ends
      *  -- repeat the non-capturing group zero-or-more times
    )  -- capture group 1 ends
    (.*)  -- capture all remaining characters in group 2
    

    Example

    To test the regular expression, I used the following Perl script...

    #!/usr/bin/perl -w
    
    use strict;
    use warnings;
    
    sub test {
      my $str = shift;
      my $testname = shift;
    
      $str =~ m#((?:[^/]*/)*)(.*)#;
    
      print "$str -- $testname\n";
      print "  1: $1\n";
      print "  2: $2\n\n";
    }
    
    test('/var/log/xyz/10032008.log', 'absolute path');
    test('var/log/xyz/10032008.log', 'relative path');
    test('10032008.log', 'filename-only');
    test('/10032008.log', 'file directly under root');
    

    The output of the script...

    /var/log/xyz/10032008.log -- absolute path
      1: /var/log/xyz/
      2: 10032008.log
    
    var/log/xyz/10032008.log -- relative path
      1: var/log/xyz/
      2: 10032008.log
    
    10032008.log -- filename-only
      1:
      2: 10032008.log
    
    /10032008.log -- file directly under root
      1: /
      2: 10032008.log
    
    0 讨论(0)
  • 2020-11-30 04:37

    What about this?

    [/]{0,1}([^/]+[/])*([^/]*)
    

    Deterministic :

    ((/)|())([^/]+/)*([^/]*)
    

    Strict :

    ^[/]{0,1}([^/]+[/])*([^/]*)$
    ^((/)|())([^/]+/)*([^/]*)$
    
    0 讨论(0)
提交回复
热议问题