How to use regular expressions in wget for rejecting files?

心已入冬 提交于 2020-12-27 17:11:44

问题


I am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.

string-ID

for example:

newsbrief-02

How I can tell wget not to download these files (the files which their names start with specified string)?


回答1:


You can not specify a regular expression in the wget -R key, but you can specify a template (like file template in a shell).

The answer looks like:

$ wget -R 'newsbrief-*' ...

You can also use ? and symbol classes [].

For more information see info wget.




回答2:


Since (apparently) v1.14 wget accepts regular expressions : --reject-regex and --accept-regex (with --regex-type posix by default, can be set to pcre if compiled with libpcre support).

Beware that it seems you can use --reject-regex only once per wget call. That is, you have to use | in a single regex if you want to select on several regex :

wget --reject-regex 'expr1|expr2|…' http://example.com


来源:https://stackoverflow.com/questions/11231736/how-to-use-regular-expressions-in-wget-for-rejecting-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!