I want to get the filename (without extension) and the extension separately.
The best solution I found so far is:
NAME=`echo \"$FILE\" | cut -d\'.\'
Here is code with AWK. It can be done more simply. But I am not good in AWK.
filename$ ls
abc.a.txt a.b.c.txt pp-kk.txt
filename$ find . -type f | awk -F/ '{print $2}' | rev | awk -F"." '{$1="";print}' | rev | awk 'gsub(" ",".") ,sub(".$", "")'
abc.a
a.b.c
pp-kk
filename$ find . -type f | awk -F/ '{print $2}' | awk -F"." '{print $NF}'
txt
txt
txt
No need to bother with awk
or sed
or even perl
for this simple task. There is a pure-Bash, os.path.splitext()
-compatible solution which only uses parameter expansions.
Documentation of os.path.splitext(path):
Split the pathname path into a pair
(root, ext)
such thatroot + ext == path
, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored;splitext('.cshrc')
returns('.cshrc', '')
.
Python code:
root, ext = os.path.splitext(path)
root="${path%.*}"
ext="${path#"$root"}"
root="${path#.}";root="${path%"$root"}${root%.*}"
ext="${path#"$root"}"
Here are test cases for the Ignoring leading periods implementation, which should match the Python reference implementation on every input.
|---------------|-----------|-------|
|path |root |ext |
|---------------|-----------|-------|
|' .txt' |' ' |'.txt' |
|' .txt.txt' |' .txt' |'.txt' |
|' txt' |' txt' |'' |
|'*.txt.txt' |'*.txt' |'.txt' |
|'.cshrc' |'.cshrc' |'' |
|'.txt' |'.txt' |'' |
|'?.txt.txt' |'?.txt' |'.txt' |
|'\n.txt.txt' |'\n.txt' |'.txt' |
|'\t.txt.txt' |'\t.txt' |'.txt' |
|'a b.txt.txt' |'a b.txt' |'.txt' |
|'a*b.txt.txt' |'a*b.txt' |'.txt' |
|'a?b.txt.txt' |'a?b.txt' |'.txt' |
|'a\nb.txt.txt' |'a\nb.txt' |'.txt' |
|'a\tb.txt.txt' |'a\tb.txt' |'.txt' |
|'txt' |'txt' |'' |
|'txt.pdf' |'txt' |'.pdf' |
|'txt.tar.gz' |'txt.tar' |'.gz' |
|'txt.txt' |'txt' |'.txt' |
|---------------|-----------|-------|
All tests passed.
I use the following script
$ echo "foo.tar.gz"|rev|cut -d"." -f3-|rev
foo
Building from Petesh answer, if only the filename is needed, both path and extension can be stripped in a single line,
filename=$(basename ${fullname%.*})
If you also want to allow empty extensions, this is the shortest I could come up with:
echo 'hello.txt' | sed -r 's/.+\.(.+)|.*/\1/' # EXTENSION
echo 'hello.txt' | sed -r 's/(.+)\..+|(.*)/\1\2/' # FILENAME
1st line explained: It matches PATH.EXT or ANYTHING and replaces it with EXT. If ANYTHING was matched, the ext group is not captured.
~% FILE="example.tar.gz"
~% echo "${FILE%%.*}"
example
~% echo "${FILE%.*}"
example.tar
~% echo "${FILE#*.}"
tar.gz
~% echo "${FILE##*.}"
gz
For more details, see shell parameter expansion in the Bash manual.