I want to get the filename (without extension) and the extension separately.
The best solution I found so far is:
NAME=`echo \"$FILE\" | cut -d\'.\'
pax> echo a.b.js | sed 's/\.[^.]*$//'
a.b
pax> echo a.b.js | sed 's/^.*\.//'
js
works fine, so you can just use:
pax> FILE=a.b.js
pax> NAME=$(echo "$FILE" | sed 's/\.[^.]*$//')
pax> EXTENSION=$(echo "$FILE" | sed 's/^.*\.//')
pax> echo $NAME
a.b
pax> echo $EXTENSION
js
The commands, by the way, work as follows.
The command for NAME
substitutes a "."
character followed by any number of non-"."
characters up to the end of the line, with nothing (i.e., it removes everything from the final "."
to the end of the line, inclusive). This is basically a non-greedy substitution using regex trickery.
The command for EXTENSION
substitutes a any number of characters followed by a "."
character at the start of the line, with nothing (i.e., it removes everything from the start of the line to the final dot, inclusive). This is a greedy substitution which is the default action.
Here is the algorithm I used for finding the name and extension of a file when I wrote a Bash script to make names unique when names conflicted with respect to casing.
#! /bin/bash
#
# Finds
# -- name and extension pairs
# -- null extension when there isn't an extension.
# -- Finds name of a hidden file without an extension
#
declare -a fileNames=(
'.Montreal'
'.Rome.txt'
'Loundon.txt'
'Paris'
'San Diego.txt'
'San Francisco'
)
echo "Script ${0} finding name and extension pairs."
echo
for theFileName in "${fileNames[@]}"
do
echo "theFileName=${theFileName}"
# Get the proposed name by chopping off the extension
name="${theFileName%.*}"
# get extension. Set to null when there isn't an extension
# Thanks to mklement0 in a comment above.
extension=$([[ "$theFileName" == *.* ]] && echo ".${theFileName##*.}" || echo '')
# a hidden file without extenson?
if [ "${theFileName}" = "${extension}" ] ; then
# hidden file without extension. Fixup.
name=${theFileName}
extension=""
fi
echo " name=${name}"
echo " extension=${extension}"
done
The test run.
$ config/Name\&Extension.bash
Script config/Name&Extension.bash finding name and extension pairs.
theFileName=.Montreal
name=.Montreal
extension=
theFileName=.Rome.txt
name=.Rome
extension=.txt
theFileName=Loundon.txt
name=Loundon
extension=.txt
theFileName=Paris
name=Paris
extension=
theFileName=San Diego.txt
name=San Diego
extension=.txt
theFileName=San Francisco
name=San Francisco
extension=
$
FYI: The complete transliteration program and more test cases can be found here: https://www.dropbox.com/s/4c6m0f2e28a1vxf/avoid-clashes-code.zip?dl=0
You could use the cut command to remove the last two extensions (the ".tar.gz"
part):
$ echo "foo.tar.gz" | cut -d'.' --complement -f2-
foo
As noted by Clayton Hughes in a comment, this will not work for the actual example in the question. So as an alternative I propose using sed
with extended regular expressions, like this:
$ echo "mpc-1.0.1.tar.gz" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'
mpc-1.0.1
It works by removing the last two (alpha-numeric) extensions unconditionally.
[Updated again after comment from Anders Lindahl]
That doesn't seem to work if the file has no extension, or no filename. Here is what I'm using; it only uses builtins and handles more (but not all) pathological filenames.
#!/bin/bash
for fullpath in "$@"
do
filename="${fullpath##*/}" # Strip longest match of */ from start
dir="${fullpath:0:${#fullpath} - ${#filename}}" # Substring from 0 thru pos of filename
base="${filename%.[^.]*}" # Strip shortest match of . plus at least one non-dot char from end
ext="${filename:${#base} + 1}" # Substring from len of base thru end
if [[ -z "$base" && -n "$ext" ]]; then # If we have an extension and no base, it's really the base
base=".$ext"
ext=""
fi
echo -e "$fullpath:\n\tdir = \"$dir\"\n\tbase = \"$base\"\n\text = \"$ext\""
done
And here are some testcases:
$ basename-and-extension.sh / /home/me/ /home/me/file /home/me/file.tar /home/me/file.tar.gz /home/me/.hidden /home/me/.hidden.tar /home/me/.. . /: dir = "/" base = "" ext = "" /home/me/: dir = "/home/me/" base = "" ext = "" /home/me/file: dir = "/home/me/" base = "file" ext = "" /home/me/file.tar: dir = "/home/me/" base = "file" ext = "tar" /home/me/file.tar.gz: dir = "/home/me/" base = "file.tar" ext = "gz" /home/me/.hidden: dir = "/home/me/" base = ".hidden" ext = "" /home/me/.hidden.tar: dir = "/home/me/" base = ".hidden" ext = "tar" /home/me/..: dir = "/home/me/" base = ".." ext = "" .: dir = "" base = "." ext = ""
Ok so if I understand correctly, the problem here is how to get the name and the full extension of a file that has multiple extensions, e.g., stuff.tar.gz
.
This works for me:
fullfile="stuff.tar.gz"
fileExt=${fullfile#*.}
fileName=${fullfile%*.$fileExt}
This will give you stuff
as filename and .tar.gz
as extension. It works for any number of extensions, including 0. Hope this helps for anyone having the same problem =)
Simply use ${parameter%word}
In your case:
${FILE%.*}
If you want to test it, all following work, and just remove the extension:
FILE=abc.xyz; echo ${FILE%.*};
FILE=123.abc.xyz; echo ${FILE%.*};
FILE=abc; echo ${FILE%.*};