I\'m looking an approach how to extract video frames and corresponding audio segments from a video file using python. I know well about opencv. But it allows just to extract
You are correct that you can not get audio via openCV. You're best bet might be to extract the video frames and audio separately and then manipulate it from there. Some tools which might help include:
ffmpy
ffmpeg (via sub-process)
You can learn more about sub-processing ffmpeg on this related stack overflow answer here: https://stackoverflow.com/a/26741357/7604321
From then you can load in the audio file and process alongside your video frames.
Without much more information from your question I can't suggest much more.