Every time I transcribe a recording I just get a huge block of text. Is there a way to detect paragraphs?
Perhaps detect 2-4 second long pauses and insert a tag or someth