I\'m looking an approach how to extract video frames and corresponding audio segments from a video file using python. I know well about opencv. But it allows just to extract
Finally, I found moviepy https://pypi.python.org/pypi/moviepy which implements light wrapper for ffmpeg and provides interface to quickly obtain video and audio frames at the same time positions. You may find example below:
from moviepy.editor import *
video = VideoFileClip('your video filename')
audio = video.audio
duration = video.duration # == audio.duration, presented in seconds, float
#note video.fps != audio.fps
step = 0.1
for t in range(int(duration / step)): # runs through audio/video frames obtaining them by timestamp with step 100 msec
t = t * step
if t > audio.duration or t > video.duration: break
audio_frame = audio.get_frame(t) #numpy array representing mono/stereo values
video_frame = video.get_frame(t) #numpy array representing RGB/gray frame
Besides extracting a/v frames moviepy provides wide functionality spectrum for audio/video clips modification.
You are correct that you can not get audio via openCV. You're best bet might be to extract the video frames and audio separately and then manipulate it from there. Some tools which might help include:
ffmpy
ffmpeg (via sub-process)
You can learn more about sub-processing ffmpeg on this related stack overflow answer here: https://stackoverflow.com/a/26741357/7604321
From then you can load in the audio file and process alongside your video frames.
Without much more information from your question I can't suggest much more.