I have a huge MS Word document and I need to extract the text into a json format. Each of the sections in the docx looks like this (the underscores are actually spaces, just dem