I have a bash script that iterates over a list of links, curl\'s down an html page per link, greps for a particular string format (syntax is: CVE-####-####), removes the surroun
HTML files can contain carriage returns at the ends of lines, you need to filter those out.
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read cve; do
Notice that there's no need to use grep
, you can use a regular expression filter in the sed
command. (You can also use the tr
command in sed
to remove characters, but doing this for \r
is cumbersome, so I piped to tr
instead).
It should look like this:
# First: Care about quoting your variables!
# Use read to read the file line by line
while read -r link ; do
# No grep required. sed can do that.
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | while read -r cve; do
echo "$cve"
# grep -F searches for fixed strings instead of patterns
grep -F "$cve" ./changelog.txt
done
done < links.txt