Have you looked at using Hadoop's streaming?
I use it in python all the time :-).
I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.
If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.