I made a console application, using directshow, that record from a live source (now a webcam, then a tv capture card), add current date and time in overlay and then save aud
The solution is in writing a custom DShow filter with two input pins in your case. One for audio stream and the other for video stream. Inside that filter (doesn't have to be inside from the architecture point of view, because you can also use callbacks for example and do the job somewhere else) you should create asf files. While switching files, A/V data would be stored in cache (e.g. big enough circular buffer). You can also watch and modify A/V sync in that filter. For writing ASF files I would recommend Windows Media Format SDK.
You can also add output pins if you like to pass A/V data further if necessary for preview, parallel streaming etc...