I have come across some sample codes where set of images are added to make a QTmovie.
I am targeting this for OS X platform without any QT frameworks. I have ague idea o
If you want to write a program to do this, you could use Xuggler in Java to do it. It will allow you to save your final video in a format playable by almost any media player.
Start out by gaining an understanding of how video files (e.g. MP4, Quicktime) actually represent audio and video with this Overly Simplistic Guide to Internet Video.
Then, play around with the MediaTool tutorials. You can write programs that make raw images into video files (see this sample code). Finally, to write a program that makes audio and video that are in sync, see this tutorial; it generates a set of images, and makes some audio noise that is timed to change when a ball hits the edge of a box.
Hope that helps.
Art
Without QuickTime (or an equivalent multimedia framework), what you describe is quite a lot of work. Ordinarily, you would use a video compression algorithm (such as H.264) to encode your images into video, and an audio compression algorithm (such as AAC) to encode your audio track. Then you would write these streams into a container file, such as an MPEG-4 file, which interleaves the streams for playback, contains metadata and indexes and so on. Then for playback, you parse the file, decode the video and audio data, and schedule them for playback, taking care to keep them in sync.
QuickTime does all this (and more) for you, and it would be an enormous undertaking to write it all yourself. Is there some reason why you are running on OS X but cannot use QuickTime?
Given the question is tagged with iPhone, why can't you just use QTKit?
If you had to do it from scratch, you could adopt a very simple solution whereby you store your image sequence as a set of JPEG files (but then you would require libjpeg
; use raw RGB or PPM if you must), the audio track as a raw WAV data, and then have another file (a text file you define) that stored timing information, so you would simply stream out the audio, and have the frame numbers of the images stored with their corresponding timecode/sample offset. That is a very simple solution that could be made to work without too much effort.
If you give us some more idea of what you are trying to achieve, we could offer some more specific suggestions.