问题
I am writing a generic shell script which filters out files based on given regex.
My shell script:
files=$(find $path -name $regex)
In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:
20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS
I am unable to arrive at the correct regex.
I am able to get the path of the files inside the folder using the regex '*data.txt'
, as I know the name of the file inside it.
But it gives me the full path of the file, something like
/path/20161128-20:34:33:432813246/data.txt
What I want is simply:
/path/20161128-20:34:33:432813246
Please help me in identifying the correct regex for my requirement
NOTE:
I know how to process the data after
files=$(find $path -name $regex)
But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.
回答1:
Per POSIX,
find
's-name
-path
primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).-name
and matches the pattern against the basename (mere filename) part of an input path only-path
matches the pattern against the whole pathname (the full path)
Both GNU and BSD/macOS
find
implement nonstandard extensions:-iname
and-ipath
, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.-regex
and-iregex
tests for matching pathnames by regex (regular expression).- Caveat: Both implementations offer at least 2 regex dialects to choose from (
-E
activates support for extended regular expressions in BSDfind
, and GNUfind
allows selecting from several dialects with-regextype
, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.
- Caveat: Both implementations offer at least 2 regex dialects to choose from (
With your folder names following a fixed-width naming scheme, a pattern would work:
pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Of course, you can take a shortcut if you don't expect false positives:
pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'
Note how *
and ?
, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*
) or any single character (?
).
If we put it all together:
files=$(find "$path" -type d -name "$pattern")
It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value
$pattern
.Note that I've added
-type d
to limit matching to directories (folders), which improves performance.
Optional background information:
Below is a regex feature matrix as of GNU find
v4.6.0 / BSD find
as found on macOS 10.12.1:
GNU
find
features are listed by the types supported by the-regextype
option, withemacs
being the default.- Note that several
posix-*
-named regex types are misnomers in that they support features beyond what POSIX mandates.
- Note that several
BSD
find
features are listed bybasic
(using NO regex option, which implies platform-flavored BREs) andextended
(using option-E
, which implies platform-flavored EREs).
For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended
with GNU find
and using -E
with BSD find
is safe, but note that not all features you may expect will be supported, notably \b
, \<
/\>
and character class shortcuts such as \d
.
=================== GNU find ===================
== REGEX FEATURE: \{\}
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: {}
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: -
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \+
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: -
TYPE: sed: ✓
== REGEX FEATURE: +
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \b
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: \< \>
TYPE: awk: -
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: -
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: [:digit:]
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: ✓
TYPE: emacs: -
TYPE: gnu-awk: ✓
TYPE: grep: ✓
TYPE: posix-awk: ✓
TYPE: posix-basic: ✓
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: ✓
TYPE: sed: ✓
== REGEX FEATURE: \d
TYPE: awk: -
TYPE: egrep: -
TYPE: ed: -
TYPE: emacs: -
TYPE: gnu-awk: -
TYPE: grep: -
TYPE: posix-awk: -
TYPE: posix-basic: -
TYPE: posix-egrep: -
TYPE: posix-extended: -
TYPE: posix-minimal-basic: -
TYPE: sed: -
== REGEX FEATURE: \s
TYPE: awk: ✓
TYPE: egrep: ✓
TYPE: ed: -
TYPE: emacs: ✓
TYPE: gnu-awk: ✓
TYPE: grep: -
TYPE: posix-awk: ✓
TYPE: posix-basic: -
TYPE: posix-egrep: ✓
TYPE: posix-extended: ✓
TYPE: posix-minimal-basic: -
TYPE: sed: -
=================== BSD find ===================
== REGEX FEATURE: \{\}
TYPE: basic: ✓
TYPE: extended: -
== REGEX FEATURE: {}
TYPE: basic: -
TYPE: extended: ✓
== REGEX FEATURE: \+
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: +
TYPE: basic: -
TYPE: extended: ✓
== REGEX FEATURE: \b
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: \< \>
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: [:digit:]
TYPE: basic: ✓
TYPE: extended: ✓
== REGEX FEATURE: \d
TYPE: basic: -
TYPE: extended: -
== REGEX FEATURE: \s
TYPE: basic: -
TYPE: extended: ✓
回答2:
When you have a full path of a file, then you don't need a regex to extract the directory name.
dirname "/path/20161128-20:34:33:432813246/data.txt"
will give you
/path/20161128-20:34:33:432813246
If you really want a regex, try this:
\d{8}-\d{2}:\d{2}:\d{2}:\d{9}
来源:https://stackoverflow.com/questions/40867729/filter-folders-whose-name-is-a-timestamp-pattern-matching-vs-regex-matching-u