I need to extract just the filename (no file extension) from the following path....
\\\\my-local-server\\path\\to\\this_file may_contain-any&character.pdf<
Click the Explain button on these links shown TEST to see how they work.
This is specific to the pdf
extension.
TEST ^.+\\([^.]+)\.pdf$
This is specific to any extension, not just pdf
.
TEST ^.+\\([^.]+)\.[^\.]+$
([^.]+)
This is the $1
capture group to extract the filename without the extension.
\\my-local-server\path\to\this_file may_contain-any&character.pdf
will return
this_file may_contain-any&character
Here a solution to extract the file name without the dot of the extension. I begin with the answer from @Hammad Khan and add the dot in the search character. So, dots can be part of the file name:
[ \w-.]+\.
Then use the regex look ahead(?= )
for a dot, so it will stop the search at the last dot (the dot before the extension), and the dot will not appears in the result:
[ \w-.]+(?=[.])
reorder, it's not necessary but look better:
[\w-. ]+(?=[.])
^\\(.+\\)*(.+)\.(.+)$
This regex has been tested on these two examples:
\var\www\www.example.com\index.php
\index.php
First block "(.+\)*" matches directory path.
Second block "(.+)" matches file name without extension.
Third block "(.+)$" matches extension.
Direct approach:
To answer your question as it's written, this will provide the most exact match:
^\\\\my-local-server\\path\\to\\(.+)\.pdf$
General approach:
This regex is short and simple, matches any filename in any folder (with or without extension) on both windows and *NIX:
.*[\\/]([^.]+)
If a file has multiple dots in its name, the above regex will capture the filename up to the first dot. This can easily be modified to match until the last dot if you know that you will not have files without extensions or that you will not have a path with dots in it.
If you know that the folder will only contain .pdf files or you are only interested in .pdf files and also know that the extension will never be misspelled, I would use this regex:
.*[\\/](.+)\.pdf$
Explanation:
.
matches anything except line terminators.*
repeats the previous match from zero to as many times as possible.[\\/]
matches a the last backslash or forward slash (previous ones are consumed by .*
). It is possible to omit either the backslash or the forward slash if you know that only one type of environment will be used.
If you want to capture the path, surround .*
or .*[\\/]
in parenthesis.[^.]
matches anything that is not a literal dot.+
repeats the previous match one or more times, as many as possible.\.
matches a literal dot.pdf
matches the string pdf.$
asserts the end of the string.If you want to match files with zero, one or multiple dots in their names placed in a variable path which also may contain dots, it will start to get ugly. I have not provided an answer for this scenario as I think it is unlikely.
Edit: To also capture filenames without a path, replace the first part with (?:.*[\\/])?
, which is an optional non-capturing group.
Here is an alternative that works on windows/unix:
"^(([A-Z]:)?[\.]?[\\{1,2}/]?.*[\\{1,2}/])*(.+)\.(.+)"
First block: path
Second block: dummy
Third block: file name
Fourth block: extension
Tested on:
".\var\www\www.example.com\index.php"
"\var\www\www.example.com\index.php"
"/var/www/www.example.com/index.php"
"./var/www/www.example.com/index.php"
"C:/var/www/www.example.com/index.php"
"D:/var/www/www.example.com/index.php"
"D:\\var\\www\\www.example.com\\index.php"
"\index.php"
"./index.php"
This regular expression extract the file extension, if group 3 isn't null it's the extension.
.*\\(.*\.(.+)|.*$)