I have a file that contains \"straight\" (normal, ASCII) quotes, and I\'m trying to convert them to real quotation mark glyphs (“curly” quotes, U+2018 to U+201D). Since the tran
The basic thing is to always try to find matching pairs. Given that every quote has a matching quote you could make your program ask for your help only where it's unsure which is the matching quote.
Opening quotes are always at the opening of a line or have a space in front of them. Closing quotes always a space after them. If you find a colon with a following quote it's probably a closing quote.
If the letter following the quote is upper case it's probably an opening quote.
If there's a punctuation mark in front of the quote it's probably a closing quote.
Try to do it iteratively. The program should ask you first for all the quotes that it can definitely assign to a function. (Just to make sure it hasn't made any errors.)
In the second round something like all the quotes that it's unsure whether they are opening quotes or apostrophes. For all opening quotes it has to find automatically the closing quote.
Another, maybe less complex, idea could be:
Find all non-quotes by asking the user about each one that could potentially be a quote or a non-quote.
All the remaining quotes should be fairly easy to convert. Opening quotes have a spaces or newline in front of them and closing after them.
One last piece of thought:
You should break the process apart like processing only paragraph-wise. If your program makes an error, which it probably will given the complexity of language, it's easier for you to correct it and the program can start fresh with the new paragraph.