In my bash script I need to extract just the path from the given URL. For example, from the variable containing string:
http://login:password@example.com/one/more/dir/fi
The Perl snippet is intriguing, and since Perl is present in most Linux distros, quite useful, but...It doesn't do the job completely. Specifically, there is a problem in translating the URL/URI format from UTF-8 into path Unicode. Let me give an example of the problem. The original URI may be:
file:///home/username/Music/Jean-Michel%20Jarre/M%C3%A9tamorphoses/01%20-%20Je%20me%20souviens.mp3
The corresponding path would be:
/home/username/Music/Jean-Michel Jarre/Métamorphoses/01 - Je me souviens.mp3
%20
became space, %C3%A9
became 'é'. Is there a Linux command, bash feature, or Perl script that can handle this transformation, or do I have to write a humongous series of sed substring substitutions? What about the reverse transformation, from path to URL/URI?
(Follow-up)
Looking at http://search.cpan.org/~gaas/URI-1.54/URI.pm, I first saw the as_iri method, but that was apparently missing from my Linux (or is not applicable, somehow). Turns out the solution is to replace the "->path" part with "->file". You can then break that further down using basename and dirname, etc. The solution is thus:
path=$( echo "$url" | perl -MURI -le 'chomp($url = <>); print URI->new($url)->file' )
Oddly, using "->dir" instead of "->file" does NOT extract the directory part: rather, it formats the URI so it can be used as an argument to mkdir and the like.
(Further follow-up)
Any reason why the line cannot be shortened to this?
path=$( echo "$url" | perl -MURI -le 'print URI->new(<>)->file' )