Parse URL in shell script

后端 未结 14 1637
一生所求
一生所求 2020-12-02 20:55

I have url like:

sftp://user@host.net/some/random/path

I want to extract user, host and path from this string. Any part can be random lengt

相关标签:
14条回答
  • 2020-12-02 21:41

    A simplistic approach to get just the domain from full URL:

    echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f1-3
    
    # OUTPUT>>> https://stackoverflow.com
    

    Get only the path:

    echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f4-
    
    # OUTPUT>>> questions/6174220/parse-url-in-shell-script
    

    Not perfect, as the second command strips the preceding slash so you'll need to prepend it by hand.

    For getting the path only here is an awk-based version:

    echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script/59971653 | awk -F"/" '{ for (i=4; i<=NF; i++) printf"/%s", $i }'
    
    # OUTPUT>>> /questions/6174220/parse-url-in-shell-script/59971653
    
    0 讨论(0)
  • 2020-12-02 21:42

    You can use bash string manipulation. It is easy to learn. In case you feel difficulties with regex, try it. As it is from NAUTILUS_SCRIPT_CURRENT_URI, i guess there may have port in that URI. So I also kept that optional.

    #!/bin/bash
    
    #You can also use environment variable $NAUTILUS_SCRIPT_CURRENT_URI
    X="sftp://user@host.net/some/random/path"
    
    tmp=${X#*//};usr=${tmp%@*}
    tmp=${X#*@};host=${tmp%%/*};[[ ${X#*://} == *":"* ]] && host=${host%:*}
    tmp=${X#*//};path=${tmp#*/}
    proto=${X%:*}
    [[ ${X#*://} == *":"* ]] && tmp=${X##*:} && port=${tmp%%/*}
    
    echo "Potocol:"$proto" User:"$usr" Host:"$host" Port:"$port" Path:"$path
    
    0 讨论(0)
  • 2020-12-02 21:44

    Here's my take, loosely based on some of the existing answers, but it can also cope with GitHub SSH clone URLs:

    #!/bin/bash
    
    PROJECT_URL="git@github.com:heremaps/here-aaa-java-sdk.git"
    
    # Extract the protocol (includes trailing "://").
    PARSED_PROTO="$(echo $PROJECT_URL | sed -nr 's,^(.*://).*,\1,p')"
    
    # Remove the protocol from the URL.
    PARSED_URL="$(echo ${PROJECT_URL/$PARSED_PROTO/})"
    
    # Extract the user (includes trailing "@").
    PARSED_USER="$(echo $PARSED_URL | sed -nr 's,^(.*@).*,\1,p')"
    
    # Remove the user from the URL.
    PARSED_URL="$(echo ${PARSED_URL/$PARSED_USER/})"
    
    # Extract the port (includes leading ":").
    PARSED_PORT="$(echo $PARSED_URL | sed -nr 's,.*(:[0-9]+).*,\1,p')"
    
    # Remove the port from the URL.
    PARSED_URL="$(echo ${PARSED_URL/$PARSED_PORT/})"
    
    # Extract the path (includes leading "/" or ":").
    PARSED_PATH="$(echo $PARSED_URL | sed -nr 's,[^/:]*([/:].*),\1,p')"
    
    # Remove the path from the URL.
    PARSED_HOST="$(echo ${PARSED_URL/$PARSED_PATH/})"
    
    echo "proto: $PARSED_PROTO"
    echo "user: $PARSED_USER"
    echo "host: $PARSED_HOST"
    echo "port: $PARSED_PORT"
    echo "path: $PARSED_PATH"
    

    which gives

    proto:
    user: git@
    host: github.com
    port:
    path: :heremaps/here-aaa-java-sdk.git
    

    And for PROJECT_URL="ssh://sschuberth@git.eclipse.org:29418/jgit/jgit" you get

    proto: ssh://
    user: sschuberth@
    host: git.eclipse.org
    port: :29418
    path: /jgit/jgit
    
    0 讨论(0)
  • 2020-12-02 21:48

    I did not like above methods and wrote my own. It is for ftp link, just replace ftp with http if your need it. First line is a small validation of link, link should look like ftp://user:pass@host.com/path/to/something.

    if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+@[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
    
    login=$(  echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
    pass=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
    host=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
    dir=$(    echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
    

    My actual goal was to check ftp access by url. Here is the full result:

    #!/bin/bash
    
    test_ftp_url()  # lftp may hang on some ftp problems, like no connection
        {
        local url="$1"
    
        if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+@[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
    
        local login=$(  echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
        local pass=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
        local host=$(   echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
        local dir=$(    echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^@]\+\)@\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
    
        exec 3>&2 2>/dev/null
        exec 6<>"/dev/tcp/$host/21" || { exec 2>&3 3>&-; echo 'Bash network support is disabled. Skipping ftp check.'; return 0; }
    
        read <&6
        if ! echo "${REPLY//$'\r'}" | grep -q '^220'; then exec 2>&3  3>&- 6>&-; return 3; fi   # 220 vsFTPd 3.0.2+ (ext.1) ready...
    
        echo -e "USER $login\r" >&6; read <&6
        if ! echo "${REPLY//$'\r'}" | grep -q '^331'; then exec 2>&3  3>&- 6>&-; return 4; fi   # 331 Please specify the password.
    
        echo -e "PASS $pass\r" >&6; read <&6
        if ! echo "${REPLY//$'\r'}" | grep -q '^230'; then exec 2>&3  3>&- 6>&-; return 5; fi   # 230 Login successful.
    
        echo -e "CWD $dir\r" >&6; read <&6
        if ! echo "${REPLY//$'\r'}" | grep -q '^250'; then exec 2>&3  3>&- 6>&-; return 6; fi   # 250 Directory successfully changed.
    
        echo -e "QUIT\r" >&6
    
        exec 2>&3  3>&- 6>&-
        return 0
        }
    
    test_ftp_url 'ftp://fz223free:fz223free@ftp.zakupki.gov.ru/out/nsi/nsiProtocol/daily'
    echo "$?"
    
    0 讨论(0)
  • 2020-12-02 21:49

    Using Python (best tool for this job, IMHO):

    #!/usr/bin/env python
    
    import os
    from urlparse import urlparse
    
    uri = os.environ['NAUTILUS_SCRIPT_CURRENT_URI']
    result = urlparse(uri)
    user, host = result.netloc.split('@')
    path = result.path
    print('user=', user)
    print('host=', host)
    print('path=', path)
    

    Further reading:

    • os.environ
    • urlparse.urlparse()
    0 讨论(0)
  • 2020-12-02 21:51

    If you have access to Node.js:

    export MY_URI=sftp://user@host.net/some/random/path
    node -e "console.log(url.parse(process.env.MY_URI).user)"
    node -e "console.log(url.parse(process.env.MY_URI).host)"
    node -e "console.log(url.parse(process.env.MY_URI).path)"
    

    This will output:

    user
    host.net
    /some/random/path
    
    0 讨论(0)
提交回复
热议问题