Extract filename and path from URL in bash script

后端 未结 13 1003
别跟我提以往
别跟我提以往 2021-01-30 14:17

In my bash script I need to extract just the path from the given URL. For example, from the variable containing string:

http://login:password@example.com/one/more/dir/fi

相关标签:
13条回答
  • 2021-01-30 14:33

    I wrote a function to that will extract any part or the URL. I've only tested it in bash. Usage:

    url_parse <url> [url-part]
    

    example:

    $ url_parse "http://example.com:8080/home/index.html" path
    home/index.html
    

    code:

    url_parse() {
      local -r url=$1 url_part=$2
      #define url tokens and url regular expression
      local -r protocol='^[^:]+' user='[^:@]+' password='[^@]+' host='[^:/?#]+' \
        port='[0-9]+' path='\/([^?#]*)' query='\?([^#]+)' fragment='#(.*)'
      local -r auth="($user)(:($password))?@"
      local -r connection="($auth)?($host)(:($port))?"
      local -r url_regex="($protocol):\/\/($connection)?($path)?($query)?($fragment)?$"
      #parse url and create an array
      IFS=',' read -r -a url_arr <<< $(echo $url | awk -v OFS=, \
        "{match(\$0,/$url_regex/,a);print a[1],a[4],a[6],a[7],a[9],a[11],a[13],a[15]}")
    
      [[ ${url_arr[0]} ]] || { echo "Invalid URL: $url" >&2 ; return 1 ; }
    
      case $url_part in
        protocol) echo ${url_arr[0]} ;;
        auth)     echo ${url_arr[1]}:${url_arr[2]} ;; # ex: john.doe:1234
        user)     echo ${url_arr[1]} ;;
        password) echo ${url_arr[2]} ;;
        host-port)echo ${url_arr[3]}:${url_arr[4]} ;; #ex: example.com:8080
        host)     echo ${url_arr[3]} ;;
        port)     echo ${url_arr[4]} ;;
        path)     echo ${url_arr[5]} ;;
        query)    echo ${url_arr[6]} ;;
        fragment) echo ${url_arr[7]} ;;
        info)     echo -e "protocol:${url_arr[0]}\nuser:${url_arr[1]}\npassword:${url_arr[2]}\nhost:${url_arr[3]}\nport:${url_arr[4]}\npath:${url_arr[5]}\nquery:${url_arr[6]}\nfragment:${url_arr[7]}";;
        "")       ;; # used to validate url
        *)        echo "Invalid URL part: $url_part" >&2 ; return 1 ;;
      esac
    }
    
    0 讨论(0)
  • 2021-01-30 14:33

    This perl one-liner works for me on the command line, so could be added to your script.

    echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | perl -n -e 'm{http://[^/]+(/[^?]+)};print $1'
    

    Note that this assumes there will always be a '?' character at the end of the string you want to extract.

    0 讨论(0)
  • 2021-01-30 14:39

    There are built-in functions in bash to handle this, e.g., the string pattern-matching operators:

    1. '#' remove minimal matching prefixes
    2. '##' remove maximal matching prefixes
    3. '%' remove minimal matching suffixes
    4. '%%' remove maximal matching suffixes

    For example:

    FILE=/home/user/src/prog.c
    echo ${FILE#/*/}  # ==> user/src/prog.c
    echo ${FILE##/*/} # ==> prog.c
    echo ${FILE%/*}   # ==> /home/user/src
    echo ${FILE%%/*}  # ==> nil
    echo ${FILE%.c}   # ==> /home/user/src/prog
    

    All this from the excellent book: "A Practical Guide to Linux Commands, Editors, and Shell Programming by Mark G. Sobell (http://www.sobell.com/)

    0 讨论(0)
  • 2021-01-30 14:40

    gawk

    echo "http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth" | awk -F"/" '
    {
     $1=$2=$3=""
     gsub(/\?.*/,"",$NF)
     print substr($0,3)
    }' OFS="/"
    

    output

    # ./test.sh
    /one/more/dir/file.exe
    
    0 讨论(0)
  • 2021-01-30 14:42
    url="http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth"
    

    GNU grep

    $ grep -Po '\w\K/\w+[^?]+' <<<$url
    /one/more/dir/file.exe
    

    BSD grep

    $ grep -o '\w/\w\+[^?]\+' <<<$url | tail -c+2
    /one/more/dir/file.exe
    

    ripgrep

    $ rg -o '\w(/\w+[^?]+)' -r '$1' <<<$url
    /one/more/dir/file.exe
    

    To get other parts of URL, check: Getting parts of a URL (Regex).

    0 讨论(0)
  • 2021-01-30 14:43

    I agree that "cut" is a wonderful tool on the command line. However, a more purely bash solution is to use a powerful feature of variable expansion in bash. For example:

    pass_first_last='password,firstname,lastname'
    
    pass=${pass_first_last%%,*}
    
    first_last=${pass_first_last#*,}
    
    first=${first_last%,*}
    
    last=${first_last#*,}
    
    or, alternatively,
    
    last=${pass_first_last##*,}
    
    0 讨论(0)
提交回复
热议问题