I want to write a program that automatically syncs unsynced subtitles. One of the solutions I thought of is to somehow algorythmically find human speech and adjust the subtiles
The technical term for what you are trying to do is called Voice Activity Detection (VAD). There is a python library called SPEAR that does it (among other things).