I have already done proper research, but still lack information on the thing I would like to achieve.
So I would like to program an application where the user can record
is there a reason for using 3gp on the client side? With mp4 (with MOOV atom set in header) you can read the temp file in chunks and send over to the server, there will likely be a slight time delay though, all depends on your connection speed as well. Your rtsp server should be able to re-encode the mp4 back to 3gp for low bandwidth viewing.
At this point, if i had to accept camera ( raw stream ) and immediately make it available to a set of clients, i would go the google hangouts route and use WebRTC. see ondello 'platform section' for the toolset/SDK. During your evaluation, you should have looked at comparative merit of WebRTC v RTSP.
IMO with its statefulness, RTSP will be a nightware behind firewalls and with NAT. AFAIK on 3G/4G the use of RTP in 3rd party apps is a bit risky.
That said, i put up on git an old android/rtp/rtsp/sdp project using libs from netty and 'efflux'. I think that this project was trying to retrieve and play just the audio track within the container ( vid track ignored and not pulled via network ) from Youtube videos all of which were encoded for RTSP at the time. I think there were some packet and frame header issues and i got fed up with RTSP and dropped it.
If you must pursue RTP/RTSP some of the packet and frame level stuff that other posters have mentioned is right there in the android classes and in the test cases that come with efflux
Check this answer: Video streaming over WIFI?
Then if u want to see the live streaming in android phone then include vlc plugin inside your application and connect through real time streaming protocol(rtsp).
Intent i = new Intent("org.videolan.vlc.VLCApplication.gui.video.VideoPlayerActivity");
i.setAction(Intent.ACTION_VIEW);
i.setData(Uri.parse("rtsp://10.0.0.179:8086/"));
startActivity(i);
If u have installed VLC on your android phone, then you can stream using intent and pass the ip address and port no as shown above.
And here is rtsp session class. It uses rtsp socket to talk to media server. Its purpose is also to hold session params, such as, what streams it can send (video and/or audio), queues, somewhat audio/video sync code.
Used interface.
package com.example.android.streaming.streaming.rtsp;
public interface PacketListener {
public void onPacketReceived(Packet p);
}
Session itself.
package com.example.android.streaming.streaming;
import static java.util.EnumSet.of;
import java.io.IOException;
import java.util.EnumSet;
import java.util.concurrent.BlockingDeque;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.ReentrantLock;
import android.app.Activity;
import android.content.SharedPreferences;
import android.hardware.Camera;
import android.hardware.Camera.CameraInfo;
import android.os.SystemClock;
import android.preference.PreferenceManager;
import android.util.Log;
import android.view.SurfaceHolder;
import com.example.android.streaming.BuildConfig;
import com.example.android.streaming.StreamingApp;
import com.example.android.streaming.streaming.audio.AACStream;
import com.example.android.streaming.streaming.rtsp.Packet;
import com.example.android.streaming.streaming.rtsp.Packet.PacketType;
import com.example.android.streaming.streaming.rtsp.PacketListener;
import com.example.android.streaming.streaming.rtsp.RtspSocket;
import com.example.android.streaming.streaming.video.H264Stream;
import com.example.android.streaming.streaming.video.VideoConfig;
import com.example.android.streaming.streaming.video.VideoStream;
public class Session implements PacketListener, Runnable {
public final static int MESSAGE_START = 0x03;
public final static int MESSAGE_STOP = 0x04;
public final static int VIDEO_H264 = 0x01;
public final static int AUDIO_AAC = 0x05;
public final static int VIDEO_TRACK = 1;
public final static int AUDIO_TRACK = 0;
private static VideoConfig defaultVideoQuality = VideoConfig.defaultVideoQualiy.clone();
private static int defaultVideoEncoder = VIDEO_H264, defaultAudioEncoder = AUDIO_AAC;
private static Session sessionUsingTheCamera = null;
private static Session sessionUsingTheCamcorder = null;
private static int startedStreamCount = 0;
private int sessionTrackCount = 0;
private static SurfaceHolder surfaceHolder;
private Stream[] streamList = new Stream[2];
protected RtspSocket socket = null;
private Activity context = null;
private String host = null;
private String path = null;
private String user = null;
private String pass = null;
private int port;
public interface SessionListener {
public void startSession(Session session);
public void stopSession(Session session);
};
public Session(Activity context, String host, int port, String path, String user, String pass) {
this.context = context;
this.host = host;
this.port = port;
this.path = path;
this.pass = pass;
}
public boolean isConnected() {
return socket != null && socket.isConnected();
}
/**
* Connect to rtsp server and start new session. This should be called when
* all the streams are added so that proper sdp can be generated.
*/
public void connect() throws IOException {
try {
socket = new RtspSocket();
socket.connect(host, port, this);
} catch (IOException e) {
socket = null;
throw e;
}
}
public void close() throws IOException {
if (socket != null) {
socket.close();
socket = null;
}
}
public static void setDefaultVideoQuality(VideoConfig quality) {
defaultVideoQuality = quality;
}
public static void setDefaultAudioEncoder(int encoder) {
defaultAudioEncoder = encoder;
}
public static void setDefaultVideoEncoder(int encoder) {
defaultVideoEncoder = encoder;
}
public static void setSurfaceHolder(SurfaceHolder sh) {
surfaceHolder = sh;
}
public boolean hasVideoTrack() {
return getVideoTrack() != null;
}
public MediaStream getVideoTrack() {
return (MediaStream) streamList[VIDEO_TRACK];
}
public void addVideoTrack(Camera camera, CameraInfo info) throws IllegalStateException, IOException {
addVideoTrack(camera, info, defaultVideoEncoder, defaultVideoQuality, false);
}
public synchronized void addVideoTrack(Camera camera, CameraInfo info, int encoder, VideoConfig quality,
boolean flash) throws IllegalStateException, IOException {
if (isCameraInUse())
throw new IllegalStateException("Camera already in use by another client.");
Stream stream = null;
VideoConfig.merge(quality, defaultVideoQuality);
switch (encoder) {
case VIDEO_H264:
if (BuildConfig.DEBUG)
Log.d(StreamingApp.TAG, "Video streaming: H.264");
SharedPreferences prefs = PreferenceManager.getDefaultSharedPreferences(context.getApplicationContext());
stream = new H264Stream(camera, info, this, prefs);
break;
}
if (stream != null) {
if (BuildConfig.DEBUG)
Log.d(StreamingApp.TAG, "Quality is: " + quality.resX + "x" + quality.resY + "px " + quality.framerate
+ "fps, " + quality.bitrate + "bps");
((VideoStream) stream).setVideoQuality(quality);
((VideoStream) stream).setPreviewDisplay(surfaceHolder.getSurface());
streamList[VIDEO_TRACK] = stream;
sessionUsingTheCamera = this;
sessionTrackCount++;
}
}
public boolean hasAudioTrack() {
return getAudioTrack() != null;
}
public MediaStream getAudioTrack() {
return (MediaStream) streamList[AUDIO_TRACK];
}
public void addAudioTrack() throws IOException {
addAudioTrack(defaultAudioEncoder);
}
public synchronized void addAudioTrack(int encoder) throws IOException {
if (sessionUsingTheCamcorder != null)
throw new IllegalStateException("Audio device is already in use by another client.");
Stream stream = null;
switch (encoder) {
case AUDIO_AAC:
if (android.os.Build.VERSION.SDK_INT < 14)
throw new IllegalStateException("This device does not support AAC.");
if (BuildConfig.DEBUG)
Log.d(StreamingApp.TAG, "Audio streaming: AAC");
SharedPreferences prefs = PreferenceManager.getDefaultSharedPreferences(context.getApplicationContext());
stream = new AACStream(this, prefs);
break;
}
if (stream != null) {
streamList[AUDIO_TRACK] = stream;
sessionUsingTheCamcorder = this;
sessionTrackCount++;
}
}
public synchronized String getSDP() throws IllegalStateException, IOException {
StringBuilder sdp = new StringBuilder();
sdp.append("v=0\r\n");
/*
* The RFC 4566 (5.2) suggests to use an NTP timestamp here but we will
* simply use a UNIX timestamp.
*/
//sdp.append("o=- " + timestamp + " " + timestamp + " IN IP4 127.0.0.1\r\n");
sdp.append("o=- 0 0 IN IP4 127.0.0.1\r\n");
sdp.append("s=Vedroid\r\n");
sdp.append("c=IN IP4 " + host + "\r\n");
sdp.append("i=N/A\r\n");
sdp.append("t=0 0\r\n");
sdp.append("a=tool:Vedroid RTP\r\n");
int payload = 96;
int trackId = 1;
for (int i = 0; i < streamList.length; i++) {
if (streamList[i] != null) {
streamList[i].setPayloadType(payload++);
sdp.append(streamList[i].generateSDP());
sdp.append("a=control:trackid=" + trackId++ + "\r\n");
}
}
return sdp.toString();
}
public String getDest() {
return host;
}
public int getTrackCount() {
return sessionTrackCount;
}
public static boolean isCameraInUse() {
return sessionUsingTheCamera != null;
}
/** Indicates whether or not the microphone is being used in a session. **/
public static boolean isMicrophoneInUse() {
return sessionUsingTheCamcorder != null;
}
private SessionListener listener = null;
public synchronized void prepare(int trackId) throws IllegalStateException, IOException {
Stream stream = streamList[trackId];
if (stream != null && !stream.isStreaming())
stream.prepare();
}
public synchronized void start(int trackId) throws IllegalStateException, IOException {
Stream stream = streamList[trackId];
if (stream != null && !stream.isStreaming()) {
stream.start();
if (BuildConfig.DEBUG)
Log.d(StreamingApp.TAG, "Started " + (trackId == VIDEO_TRACK ? "video" : "audio") + " channel.");
// if (++startedStreamCount == 1 && listener != null)
// listener.startSession(this);
}
}
public void startAll(SessionListener listener) throws IllegalStateException, IOException {
this.listener = listener;
startThread();
for (int i = 0; i < streamList.length; i++)
prepare(i);
/*
* Important to start video capture before audio capture. This makes
* audio/video de-sync smaller.
*/
for (int i = 0; i < streamList.length; i++)
start(streamList.length - i - 1);
}
public synchronized void stopAll() {
for (int i = 0; i < streamList.length; i++) {
if (streamList[i] != null && streamList[i].isStreaming()) {
streamList[i].stop();
if (BuildConfig.DEBUG)
Log.d(StreamingApp.TAG, "Stopped " + (i == VIDEO_TRACK ? "video" : "audio") + " channel.");
if (--startedStreamCount == 0 && listener != null)
listener.stopSession(this);
}
}
stopThread();
this.listener = null;
if (BuildConfig.DEBUG)
Log.d(StreamingApp.TAG, "Session stopped.");
}
public synchronized void flush() {
for (int i = 0; i < streamList.length; i++) {
if (streamList[i] != null) {
streamList[i].release();
if (i == VIDEO_TRACK)
sessionUsingTheCamera = null;
else
sessionUsingTheCamcorder = null;
streamList[i] = null;
}
}
}
public String getPath() {
return path;
}
public String getUser() {
return user;
}
public String getPass() {
return pass;
}
private BlockingDeque<Packet> audioQueue = new LinkedBlockingDeque<Packet>(MAX_QUEUE_SIZE);
private BlockingDeque<Packet> videoQueue = new LinkedBlockingDeque<Packet>(MAX_QUEUE_SIZE);
private final static int MAX_QUEUE_SIZE = 1000;
private void sendPacket(Packet p) {
try {
MediaStream channel = (p.type == PacketType.AudioPacketType ? getAudioTrack() : getVideoTrack());
p.packetizer.send(p, socket, channel.getPayloadType(), channel.getStreamId());
getPacketQueue(p.type).remove(p);
} catch (IOException e) {
Log.e(StreamingApp.TAG, "Failed to send packet: " + e.getMessage());
}
}
private final ReentrantLock queueLock = new ReentrantLock();
private final Condition morePackets = queueLock.newCondition();
private AtomicBoolean stopped = new AtomicBoolean(true);
private Thread t = null;
private final void wakeupThread() {
queueLock.lock();
try {
morePackets.signalAll();
} finally {
queueLock.unlock();
}
}
public void startThread() {
if (t == null) {
t = new Thread(this);
stopped.set(false);
t.start();
}
}
public void stopThread() {
stopped.set(true);
if (t != null) {
t.interrupt();
try {
wakeupThread();
t.join();
} catch (InterruptedException e) {
}
t = null;
}
audioQueue.clear();
videoQueue.clear();
}
private long getStreamEndSampleTimestamp(BlockingDeque<Packet> queue) {
long sample = 0;
try {
sample = queue.getLast().getSampleTimestamp() + queue.getLast().getFrameLen();
} catch (Exception e) {
}
return sample;
}
private PacketType syncType = PacketType.AnyPacketType;
private boolean aligned = false;
private final BlockingDeque<Packet> getPacketQueue(PacketType type) {
return (type == PacketType.AudioPacketType ? audioQueue : videoQueue);
}
private void setPacketTimestamp(Packet p) {
/* Don't sync on SEI packet. */
if (!aligned && p.type != syncType) {
long shift = getStreamEndSampleTimestamp(getPacketQueue(syncType));
Log.w(StreamingApp.TAG, "Set shift +" + shift + "ms to "
+ (p.type == PacketType.VideoPacketType ? "video" : "audio") + " stream ("
+ (getPacketQueue(syncType).size() + 1) + ") packets.");
p.setTimestamp(p.getDuration(shift));
p.setSampleTimestamp(shift);
if (listener != null)
listener.startSession(this);
aligned = true;
} else {
p.setTimestamp(p.packetizer.getTimestamp());
p.setSampleTimestamp(p.packetizer.getSampleTimestamp());
}
p.packetizer.setSampleTimestamp(p.getSampleTimestamp() + p.getFrameLen());
p.packetizer.setTimestamp(p.getTimestamp() + p.getDuration());
// if (BuildConfig.DEBUG) {
// Log.d(StreamingApp.TAG, (p.type == PacketType.VideoPacketType ? "Video" : "Audio") + " packet timestamp: "
// + p.getTimestamp() + "; sampleTimestamp: " + p.getSampleTimestamp());
// }
}
/*
* Drop first frames if len is less than this. First sync frame will have
* frame len >= 10 ms.
*/
private final static int MinimalSyncFrameLength = 15;
@Override
public void onPacketReceived(Packet p) {
queueLock.lock();
try {
/*
* We always synchronize on video stream. Some devices have video
* coming faster than audio, this is ok. Audio stream time stamps
* will be adjusted. Other devices that have audio come first will
* see all audio packets dropped until first video packet comes.
* Then upon first video packet we again adjust the audio stream by
* time stamp of the last video packet in the queue.
*/
if (syncType == PacketType.AnyPacketType && p.type == PacketType.VideoPacketType
&& p.getFrameLen() >= MinimalSyncFrameLength)
syncType = p.type;
if (syncType == PacketType.VideoPacketType) {
setPacketTimestamp(p);
if (getPacketQueue(p.type).size() > MAX_QUEUE_SIZE - 1) {
Log.w(StreamingApp.TAG, "Queue (" + p.type + ") is full, dropping packet.");
} else {
/*
* Wakeup sending thread only if channels synchronization is
* already done.
*/
getPacketQueue(p.type).add(p);
if (aligned)
morePackets.signalAll();
}
}
} finally {
queueLock.unlock();
}
}
private boolean hasMorePackets(EnumSet<Packet.PacketType> mask) {
boolean gotPackets;
if (mask.contains(PacketType.AudioPacketType) && mask.contains(PacketType.VideoPacketType)) {
gotPackets = (audioQueue.size() > 0 && videoQueue.size() > 0) && aligned;
} else {
if (mask.contains(PacketType.AudioPacketType))
gotPackets = (audioQueue.size() > 0);
else if (mask.contains(PacketType.VideoPacketType))
gotPackets = (videoQueue.size() > 0);
else
gotPackets = (videoQueue.size() > 0 || audioQueue.size() > 0);
}
return gotPackets;
}
private void waitPackets(EnumSet<Packet.PacketType> mask) {
queueLock.lock();
try {
do {
if (!stopped.get() && !hasMorePackets(mask)) {
try {
morePackets.await();
} catch (InterruptedException e) {
}
}
} while (!stopped.get() && !hasMorePackets(mask));
} finally {
queueLock.unlock();
}
}
private void sendPackets() {
boolean send;
Packet a, v;
/*
* Wait for any type of packet and send asap. With time stamps correctly
* set, the real send moment is not important and may be quite
* different. Media server will only check for time stamps.
*/
waitPackets(of(PacketType.AnyPacketType));
v = videoQueue.peek();
if (v != null) {
sendPacket(v);
do {
a = audioQueue.peek();
if ((send = (a != null && a.getSampleTimestamp() <= v.getSampleTimestamp())))
sendPacket(a);
} while (!stopped.get() && send);
} else {
a = audioQueue.peek();
if (a != null)
sendPacket(a);
}
}
@Override
public void run() {
Log.w(StreamingApp.TAG, "Session thread started.");
/*
* Wait for both types of front packets to come and synchronize on each
* other.
*/
waitPackets(of(PacketType.AudioPacketType, PacketType.VideoPacketType));
while (!stopped.get())
sendPackets();
Log.w(StreamingApp.TAG, "Flushing session queues.");
Log.w(StreamingApp.TAG, " " + audioQueue.size() + " audio packets.");
Log.w(StreamingApp.TAG, " " + videoQueue.size() + " video packets.");
long start = SystemClock.elapsedRealtime();
while (audioQueue.size() > 0 || videoQueue.size() > 0)
sendPackets();
Log.w(StreamingApp.TAG, "Session thread stopped.");
Log.w(StreamingApp.TAG, "Queues flush took " + (SystemClock.elapsedRealtime() - start) + " ms.");
}
}
Your overall approach sounds correct, but there are a couple of things you need to consider.
So I would like to program an application where the user can record a video and instantly (live) upload the video to a RTP/RTSP Server.
My research so far is that I have to write the video on recording to a local socket rather than to a file, because the 3gp files if written to a file cannot be accessed, until finalized (when the video is stopped and the header information have been written to the video about length and others).
Sending frames in real-time over RTP/RTCP is the correct approach. As the capture device captures each frame, you need to encode/compress it and send it over the socket. 3gp, like mp4, is a container format used for file storage. For live capture there is no need to write to a file. The only time this makes sense is e.g. in HTTP Live Streaming or DASH approaches, where media is written to a transport stream or mp4 file, before being served over HTTP.
When the socket receives the continuous data, I will need to wrap it into a RTP packet and send it to the remote server. I possibly will also have to do basic encoding first (which is not so important yet).
I would disagree, encoding is very important, you'll likely never manage to send the video otherwise, and you'll have to deal with issues such as cost (over mobile networks) and just the sheer volume of media depending on resolution and framerate.
Does anybody have any idea, if this theory is correct so far. I would also like to know if someone could point me to a few code-snippets of similar approaches, especially for sending the video on the fly to the server. I am not sure yet how to do that.
Take a look at the spydroid open source project as a starting point. It contains many of the necessary steps including how to configure the encoder, packetise to RTP, send RTCP, as well as some RTSP server functionality. Spydroid sets up an RTSP server so media is encoded and sent once an RTSP client such as VLC is used to setup an RTSP session. Since your application is driven by the phone user wanting to send media to a server, you may need to consider another approach to start the sending, even if you send some kind of message to the server to for instance setup an RTSP session like in spydroid.
I tried to achieve the same result (but abandoned due to lack of experience). My way was to use ffmpeg and/or avlib because it already has working rtmp stack. So in theory all you need is to route video stream to ffmpeg process which will stream to server.