Synchronization has always fascinated me, or to be precise: why a .ts can be viewed in sync by media players, while the demuxed audio+video reassembled is out of sync.
S
The concept of Audio Video synchronization is much deeper. The first reading i would recommed is the following paper.
http://downloads.bbc.co.uk/rd/pubs/reports/1996-02.pdf
I won't repeat everything here - but essentially, every encoder records timestamps and stamps it on the respective Audio and Video. Later on, when decoder plays it, it does two things - one, ensures that decoder's own clock is "enslaved" with encoder's clock, and two it ensures that every picture is presented on the screen and audio frame presented to speaker exactly when that respective time occurs. This is only and best way that audio remains in synchronization with video. These timestamps are called PTS/DTS values which are of resolution of 90 kHz clock.
Understand that over time clocks skew but since only the exact time is referenced, decoder playout exactly in same time order.
Now the major concern remains is that decoder's clock needs to remain in control/synchronization of encoder's clock. The first thing done in MPEG is using a higher precision at 27 MHz, (300 times higher). Further, this needs to remain consistent during any transmission path in the middle. (this is called clock recovery process).
Below are another couple of good paper that explains how clock recovery/synchronization process works.
https://www.soe.ucsc.edu/sites/default/files/technical-reports/UCSC-CRL-98-04.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.1016&rep=rep1&type=pdf
This final paper puts every thing together much nicely.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50.975&rep=rep1&type=pdf
Remember - the PCR and PTS/DTS based audio video synchronization is what make Digital TV broadcast is very stringent and is far different from any other streaming methods used in Internet streaming. This is crucial to make it 24x7 streaming to function.