Excluding super script when extracting text from pdf
问题 I have extracted text from pdf line by line using pdfbox, to process it with my algorithm by sentences. I am recognizing the sentences by using period(.) followed by a word whose first letter is capital. Here the issue is, when a sentence ends with a word which has superscript, extractor treats it as a normal character and places it next to period(.) For example: expression "2 power 22" when appeared as a last word in a sentence i.e. with a period, it has been extracted as 2.22 which makes it