I\'m writing a web crawler and want to ignore URLs which link to binary files:
$exclude = %w(flv swf png jpg gif asx zip rar tar 7z gz jar js css dtd xsd ico
You can strip off the URL's file extension with a regular expression or split
(I've shown the latter here, but beware this will also match some malformed URLs, such as http://foo.exe
), then use Array#include?
to check for membership:
@url = URI.parse(url) unless $exclude.include?(url.split('.').last)