We need to capture a live video stream from WebRTC (or any other capturing mechanism from the client webcam, even if it is not supported on all browsers, but as a PoC).
Most IP cameras these days will use H264 encoding, or MJPEG. You aren't clear about what sort of cameras are being used.
I think the real question is, what components are out there for authoring/editing video and which video format does it require. Only once you know what format you need to be in, can you transcode/transform your video as necessary so you can handle it on the server side.
There are any number of media servers to transform/transcode, and something like FFMPEG or Unreal Media Server can transform, decode, etc on server side to get it to some format you can work with. Most of the IP cameras I have seen just use an H264 web based browser player.
EDIT: Your biggest enemy is going to be your delay. 1-2 seconds of delay is going to be difficult to achieve.
Your question is too broad and asking for off-site resources is considered off-topic on stackoverflow. In order to avoid opinion-prone statements I will restrict the answer to general concepts.
Flash/RTMP
WebRTC
is not yet available on all browser so the most widely used way of capturing webcam input from a browser currently in use is via a plugin. The most common solution uses the Adobe Flash Player, whether people like it or not. This is due to the H.264
encoding support in recent versions, along with AAC
, MP3
etc. for audio.
The streaming is accomplished using the RTMP protocol which was initially designed for Flash communication. The protocol works on TCP
and has multiple flavors like RTMPS
(RTMP
over TLS/SSL
for encryption), RTMPT
(RTMP
encapsulated in HTTP
for firewall traversal).
The stream usually uses the FLV container format.
You can easily find open-source projects that use Flash to capture webcam input and stream it to an RTMP
server.
On the server-side you have two options:
RTMP
server to talk directly to the sending library and read the streamRTMP
servers and implement just a client in ASP
(you can also transcode the incoming stream on the fly depending on what you're trying to do with your app).WebRTC
With WebRTC
you can either:
WebRTC
with the server being one of the peers.A possible solution for the second scenario, which I haven't personally tested yet, is offered by Adam Roach:
- Browser retrieves a webpage with javascript in it.
- Browser executes javascript, which:
- Gets a handle to the camera using
getUserMedia
,- Creates an
RTCPeerConnection
- Calls
createOffer
andsetLocalDescription
on theRTCPeerConnection
- Sends an request to the server containing the offer (in
SDP
format)- The server processes the offer
SDP
and generates its own answerSDP
, which it returns to the browser in its response.- The JavaScript calls
setRemoteDescription
on theRTCPeerConnection
to start the media flowing.- The server starts receiving
DTLS/SRTP
packets from the browser, which it then does whatever it wants to, up to and including storing in an easily readable format on a local hard drive.
Source
This will use VP8
and Vorbis
inside WebM
over SRTP
(UDP
, can also use TCP
).
Unless you can implement RTCPeerConnection
directly in ASP
with a wrapper you'll need a way to forward the stream to your server app.
The PeerConnection API
is a powerful feature of WebRTC
. It is currently used by the WebRTC version of Google Hangouts. You can read: How does Hangouts use WebRTC.
Agreed that this is an off-topic question, but I recently bumped into the same issue/requirement, and my solution was to use MultiStreamRecorder from WebRTCExperiments. This basically gives you a "blob" of the audio/video stream every X seconds, and you can upload this to your ASP.NET MVC or WebAPI controller as demonstrated here. You can either live-process the blobs on the server part by part, or concatenate them to a file and then process once the stream stops. Note that the APIs used in this library are not fully supported in all browsers, for example there is no iOS support as of yet.
My server side analysis required user to speak full sentences, so in addition I used PitchDetect.js to detect silences in the audio stream before sending the partial blob to server. With this type of setup, you can configure your client to send partial blobs to server after they finish talking, rather than every X seconds.
As for achieving 1-2 second delay, I would suggest looking into WebSockets for delivery, rather than HTTP POST - but you should play with these options and choose the best channel for your requirements.