I\'m trying to use grep to get the full url addresses of jpg images in an HTML file. One problem is that there aren\'t many newlines in it, so when I use grep it gets the path,
One single sed
command
sed -n '/<img/s/.*src="\([^"]*\)".*/\1/p' yourfile.html
or using ERE (extended regular expressions) to avoid backslashes from above expression:
sed -E -n '/<img/s/.*src="([^"]*)".*/\1/p' yourfile.html
One basic grep
command
grep -o '<img[^>]*src="[^"]*"' yourfile.html
Two successive basic grep
commands
grep -o '<img[^>]*src="[^"]*"' yourfile.html | grep -o '"[^"]*"'
One single grep
commands using Perl Regex (PER)
grep -Po '<img[^>]*src="\K[^"]*(?=")' yourfile.html
Using ack
as a grep
-like replacement
sudo apt install ack
ack -o '<img[^>]*src="\K[^"]*(?=")' yourfile.html
Downloading a web page as proposed by s-hunter
curl -s example.com/a.html | sed -En '/<img/s/.*src="([^"]*)".*/\1/p'