Getting the jpg images from an HTML file

后端 未结 1 1247
情书的邮戳
情书的邮戳 2021-02-11 01:38

I\'m trying to use grep to get the full url addresses of jpg images in an HTML file. One problem is that there aren\'t many newlines in it, so when I use grep it gets the path,

相关标签:
1条回答
  • 2021-02-11 02:05

    One single sed command

    sed -n '/<img/s/.*src="\([^"]*\)".*/\1/p' yourfile.html
    

    or using ERE (extended regular expressions) to avoid backslashes from above expression:

    sed -E -n '/<img/s/.*src="([^"]*)".*/\1/p' yourfile.html
    

    One basic grep command

    grep -o '<img[^>]*src="[^"]*"' yourfile.html
    

    Two successive basic grep commands

    grep -o '<img[^>]*src="[^"]*"' yourfile.html | grep -o '"[^"]*"'
    

    One single grep commands using Perl Regex (PER)

    grep -Po '<img[^>]*src="\K[^"]*(?=")' yourfile.html
    

    Using ack as a grep-like replacement

    sudo apt install ack
    ack -o '<img[^>]*src="\K[^"]*(?=")' yourfile.html
    

    Downloading a web page as proposed by s-hunter

    curl -s example.com/a.html | sed -En '/<img/s/.*src="([^"]*)".*/\1/p'
    
    0 讨论(0)
提交回复
热议问题