I have a large.tar.gz file containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html
large.tar.gz
Use this with GNU tar to extract a tgz to stdout:
tar -xOzf large.tar.gz --wildcards '*.html' | grep ...
-O, --to-stdout: extract files to standard output
-O, --to-stdout