To see what file to invoke the unrar command on, one needs to determine which file is the first in the file set.
Here are some sample file names, of which - naturall
Don't rely on the names of the files to determine which one is first. You're going to end up finding an edge case where you get the wrong file.
RAR's headers will tell you which file is the first on in the volume, assuming they were created in a somewhat-recent version of RAR.
HEAD_FLAGS Bit flags:
2 bytes0x0100 - First volume (set only by RAR 3.0 and later)
So open up each file and examine the RAR headers, looking specifically for the flag that indicates which file is the first volume. This will never fail, as long as the archive isn't corrupt. I have done my own tests with spanning RAR archives and their headers are correct according to the link above.
This is a much, much safer way of determining which file is first in a set like this.
The short answer is that it's not possible to construct a single regex to satisfy your problem. Ruby 1.8 does not have lookaround assertions (the (?<! stuff in your example regex) which is why your regex doesn't work. This leaves you with two options.
1) Use more than one regex to do it.
def is_first_rar(filename)
if ((filename =~ /part(\d+)\.rar$/) == nil)
return (filename =~ /\.rar$/) != nil
else
return $1.to_i == 1
end
end
2) Use the regex engine for ruby 1.9, Oniguruma. It supports lookaround assertions, and you can install it as a gem for ruby 1.8. After that, you can do something like this:
def is_first_rar(filename)
reg = Oniguruma::ORegexp.new('.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)')
match = reg.match(filename)
return match != nil
end
I am no regex expert but here is my attempt
^(yes|no)\.(rar|part0*1\.rar)$
Replace "yes|no" with the actual file name. I matched it against your examples to see if it would only match the first set hence the "yes|no" in the regex.
UPDATE: fixed as per the comment. Not sure why the user would not know the filename so i did not fix that part...
Personally I wouldn't use (extended) regular expressions in this case (or at least not just one to do it all). What's wrong with coding this in, for example, a few if
s?