Using Ruby, how can one parse the ID3 tags of remote mp3 files without downloading the entire file to disk?
This question has been asked in Java and Silverlight, but
which Ruby version are you using?
which ID3 Tag version are you trying to read?
ID3v1 tags are at the end of a file, in the last 128 bytes. With Net::HTTP it doesn't seem to be possible to seek forward towards the end of the file and read only the last N bytes. If you try that, using
headers = {"Range" => "bytes=128-"}
, it always seems to download the complete file. resp.body.size => file-size
. But no big loss, because ID3 version 1 is pretty much outdated at this point because of it's limitations, such as fixed length format, only ASCII text, ...). iTunes uses ID3 version 2.2.0.
ID3v2 tags are at the beginning of a file - to support streaming - you can download the initial part of the MP3 file, which contains the ID3v2 header, via HTTP protocol >= 1.1
The short answer:
require 'net/http'
require 'uri'
require 'id3' # id3 RUby library
require 'hexdump'
file_url = 'http://example.com/filename.mp3'
uri = URI(file_url)
size = 1000 # ID3v2 tags can be considerably larger, because of embedded album pictures
Net::HTTP.version_1_2 # make sure we use higher HTTP protocol version than 1.0
http = Net::HTTP.new(uri.host, uri.port)
resp = http.get( file_url , {'Range' => "bytes=0-#{size}"} )
# should check the response status codes here..
if resp.body =~ /^ID3/ # we most likely only read a small portion of the ID3v2 tag..
# file has ID3v2 tag
puts resp.body.hexdump
tag2 = ID3::Tag2.new
tag2.read_from_buffer( resp.body )
@id3_tag_size = tag2.ID3v2tag_size # that's the size of the whole ID3v2 tag
# we should now re-fetch the tag with the correct / known size
# ...
end
e.g.:
index 0 1 2 3 4 5 6 7 8 9 A B C D E F
00000000 ["49443302"] ["00000000"] ["11015454"] ["3200000d"] ID3.......TT2...
00000010 ["004b6167"] ["75796120"] ["48696d65"] ["00545031"] .Kaguya Hime.TP1
00000020 ["00000e00"] ["4a756e6f"] ["20726561"] ["63746f72"] ....Juno reactor
00000030 ["0054414c"] ["00001100"] ["4269626c"] ["65206f66"] .TAL....Bible of
00000040 ["20447265"] ["616d7300"] ["54524b00"] ["00050036"] Dreams.TRK....6
00000050 ["2f390054"] ["59450000"] ["06003139"] ["39370054"] /9.TYE....1997.T
00000060 ["434f0000"] ["1300456c"] ["65637472"] ["6f6e6963"] CO....Electronic
00000070 ["612f4461"] ["6e636500"] ["54454e00"] ["000d0069"] a/Dance.TEN....i
00000080 ["54756e65"] ["73207632"] ["2e300043"] ["4f4d0000"] Tunes v2.0.COM..
00000090 ["3e00656e"] ["67695475"] ["6e65735f"] ["43444442"] >.engiTunes_CDDB
000000a0 ["5f494473"] ["00392b36"] ["34374334"] ["36373436"] _IDs.9+647C46746
000000b0 ["38413234"] ["38313733"] ["41344132"] ["30334544"] 8A248173A4A203ED
000000c0 ["32323034"] ["4341422b"] ["31363333"] ["39390000"] 2204CAB+163399..
000000d0 ["00000000"] ["00000000"] ["00000000"] ["00000000"] ................
The long answer looks something like this: (you'll need id3 library version 1.0.0_pre or newer)
require 'net/http'
require 'uri'
require 'id3' # id3 RUby library
require 'hexdump'
file_url = 'http://example.com/filename.mp3'
def get_remote_id3v2_tag( file_url ) # you would call this..
id3v2tag_size = get_remote_id3v2_tag_size( file_url )
if id3v2tag_size > 0
buffer = get_remote_bytes(file_url, id3v2tag_size )
tag2 = ID3::Tag2.new
tag2.read_from_buffer( buffer )
return tag2
else
return nil
end
end
private
def get_remote_id3v2_tag_size( file_url )
buffer = get_remote_bytes( file_url, 100 )
if buffer.bytesize > 0
return buffer.ID3v2tag_size
else
return 0
end
end
private
def get_remote_bytes( file_url, n)
uri = URI(file_url)
size = n # ID3v2 tags can be considerably larger, because of embedded album pictures
Net::HTTP.version_1_2 # make sure we use higher HTTP protocol version than 1.0
http = Net::HTTP.new(uri.host, uri.port)
resp = http.get( file_url , {'Range' => "bytes=0-#{size-1}"} )
resp_code = resp.code.to_i
if (resp_code >= 200 && resp_code < 300) then
return resp.body
else
return ''
end
end
get_remote_id3v2_tag_size( file_url )
=> 2262
See:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
http://en.wikipedia.org/wiki/Byte_serving
some examples how to download in parts files can be found here:
but please note that there seems to be no way to start downloading "in the middle"
How do I download a binary file over HTTP?
http://unixgods.org/~tilo/Ruby/ID3/docs/index.html
you would at least have to download the last blocks of the file, which contain the ID3 tags -- see ID3 tag definitions...
if you have access to the files on the remote file system, you could do. this remotely, and then transfer back the ID3 tags
Edit:
I was thinking of ID3 v1 tags -- version 2 tags are in the front.