Google cloud platform\'s Cloud speech-to-text api converts multiple speaker audio to text. It returns a JSON output which includes who said what at what time. But the speake