Windows Audio and Video Capture Software Paradigm

问题

I am writing a program that reads from multiple audio and video devices and writes the data to suitable containers (such as mpeg). I have wrote the code in Linux, but now I have to write another version for windows as well. This is how I wrote it in Linux:

initialize the devices (audio: ALSA, video: V4L2)
get the file descriptors
mainloop
   select on file descriptors
   respond to the proper device

Unfortunately my expertise is only for Linux and I have never used windows SDK. I don't know what the right paradigm is. Do people do it the same way with the fds and select? In that case is there a way to get a fd from directshow? Oh and one last thing, I am bound to use only one thread for all of this. So the solution with multiple threads running at the same time and each handling one device is not admissible. The code in Linux currently runs on one thread as well. It is also preferred that the code should be written in c++. Thank you.

Second Thoughts There is only one question asked here and that is: How can one get the file descriptor of the video/audio device from DirectShow library. People who have worked with V4L2 and ALSA, I am looking for the same thing in DirectShow.

回答1:

Roman is an expert in these topics and I don't think you'll find a better answer. To add to Roman's answer, you would do something like this in DirectShow:

Enumerate video/audio capture devices/select capture device
Construct DirectShow graph
  Add video and audio capture source filters to graph
  Add 2 sample grabbers to graph
  Configure sample grabber callbacks 
  Connect capture source filters to samples grabbers
  Add renderers to graph (This could be a null renderer or a video/audio renderer)
  Connect sample grabbers to renderers
Play graph
Run the event loop
DirectShow will invoke the callbacks per media sample traversing the graph.

Your graph would typically look like this:

                callback
                   |
Video capture -- sample grabber -- renderer

Audio capture -- sample grabber -- renderer
                   |
                 callback

As Roman said, there are many samples in the SDK showing how to

enumerate capture sources
use/configure the sample grabber
write an application in which you construct and play a graph

On the topic of threading, you'll be writing the code for the main application thread, and DirectShow will handle the internal thread management. However note that if your callback function is processing intensive, it may interfere with playback since (From MSDN):

The data processing thread blocks until the callback method returns. If the callback does not return quickly, it can interfere with playback.

This may or may not be important depending on your application requirements. If it is important you could e.g. pass the data to another thread for processing. The result of blocking the data processing thread is that you'll get lower framerates in the case of video.

回答2:

Windows offers several APIs for video and audio, which happens because older APIs were replaced with [supposed] descendants, however older APIs remained operational to maintain compatibility with existing applications.

Audio APIs: waveInXxx family of functions, DirectSound, DirectShow, WASAPI

Video APIs: Video for Windows, DirectShow, Media Foundation

Video/audio APIs with support for video+audio streams and files: Video for Windows, DirectShow, Media Foundation

All mentioned above offer certain functions, interfaces, methods, extensibility, compatibility options. I don't think fd, fds and select applies to any of the mentioned. For specific reasons one might prefer to use a combination of APIs, for example it is only WASAPI to give fine control over audio capture, however it is audio only API. Audio compression and production of media files, esp. video-enabled, is typically handled by DirectShow and Media Foundation.

Video and audio devices don't have file descriptors. In DirectShow and Media Foundation you obtain interfaces/objects of respective capture device and then you can discover device capabilities such as supported formats in API specific way. Then you can either obtain the captured data or connect the capture component to another object such as encoding or presenting the data. Since file descriptors are not a part of the story in Windows, your question becomes basically unclear. Apparently you are asking for some guidelines from those familiar with both Linux and Windows development on how to implement in Windows what you are already doing in Linux, however I am afraid you will have to end up doing it in regular Windows way, how Windows API suggest and demonstrate in documentation and samples.

DirectShow and Media Foundation APIs are covering the entire media processing pipeline steps: capture, processing, presentation. In DirectShow you build your pipeline using components "filters" connected together (MF has a similar concept) and then the higher level application controls their operation without touching actual data. The filters exchange with data without reporting to the application for every chunk of data streamed.

This is why you might have a hard time finding a possibility to "get a raw frame". DirectShow design assumes that raw frames are passed between filters and are not sent to the calling application. Getting a raw frame is trivial for a connected filter, you are expected to express all media data processing needs in terms of DirectShow filters, stock or customized.

Those who - for whatever reason - want to extract this media data stream from DirectShow pipeline often use so called Sample Grabber Filter (tens of questions on OS and MSDN forums), which is a stock filter easy to deal with capable to accept a callback function and report every piece of data streamed through. This filter is the easiest way to extract a frame from capture device with access to raw data.

DirectShow and Media Foundation standard video capture capabilities are based on supporting analog video capture devices that exist in Windows through WDM driver. For them the APIs have a respective component/filter created and available which can be connected within the pipeline. Because DirectShow is relatively easily extensible, it is possible to put other devices into the same form factor of video capture filter, and this can cover third party capture devices available through SDKs, virtual cameras etc. Once they are put into DirectShow filter, they are available to other DirectShow-compatible applications, in particular basically seeing no difference whether it is an actual camera or just a software thing.

来源：https://stackoverflow.com/questions/26228674/windows-audio-and-video-capture-software-paradigm

标签

winapi

directshow

directsound