Is there a way to generate real time depthmap from single camera video in python/opencv?

问题

I'm trying to convert single images into it's depthmap, but I can't find any useful tutorial or documentation.

I'd like to use opencv, but if you know a way to get the depth map using for example tensorflow, I'd be glad to hear it.

There are numerous tutorials for stereo vision but I want to make it cheaper because it's for a project to help blind people.

I'm currently using esp32 cam to stream frame by frame and receiving the images on python using opencv.

回答1:

Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.

One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently

reference: Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.

Source code: https://github.com/nianticlabs/monodepth2

The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.

One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.

This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.

reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.

source code: https://github.com/tum-vision/lsd_slam

来源：https://stackoverflow.com/questions/64685185/is-there-a-way-to-generate-real-time-depthmap-from-single-camera-video-in-python

标签

python

tensorflow

OpenCV

computer-vision

esp32