In my program, it gets MP4 video in, and I want it to output a MP3 (without any server-side stuff.) Since Android (and my app) needs to run on many different hardware config
MPEG_4 Part 14 (.mp4 file extension) is a container format. In other words, this specifies how multiple media streams can be packaged together. Processing container formats is much less computationally expensive than - for example - compressing or decompressing video. I would be surprised if it turned out to be too computationally expensive to read through an .mp4 file and extract an audio stream on a cell phone ARM processor.
I haven't seen any immediately suitable Java libraries either. It probably wouldn't be too hard to build your own library. Parsing container formats is much simpler than decompressing video. And you do have the libavformat implementation in ffmpeg as a reference. The MPEG4 Part 14 standards can be found here:
http://webstore.iec.ch/preview/info_isoiec14496-14%7Bed1.0%7Den.pdf
and here:
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
I see little problem with FFMPEG, since apparently it runs on 11 architectures supported by Debian. Only architecture not supported is apparently m68k, others are old versions in ports to FreeBSD kernel, or Hurd kernel. And from what I know of Android, fact that it's based on ARM isn't going to change any time soon.
Of course, there could be some issues with Java wrappers around native code. Is that the issue? I'm not an Android nor a Java programmer, but I'm sure you can detect the platform and dynamically load appropriate native wrapper.
I haven't used it, but I downloaded and am looking at the API for IBM Toolkit for MPEG-4. It looks a little light on data access features, though. The implementation is pure java, though. It looks like they've obfuscated their codec jars.