I have a file that contains \"straight\" (normal, ASCII) quotes, and I\'m trying to convert them to real quotation mark glyphs (“curly” quotes, U+2018 to U+201D). Since the tran
Here is a regular expression that might help for double-quotes:
/([^\s\(]?)"(\s*)([^\\]*?(\\.[^\\]*)*)(\s*)("|\n\n)([^\s\)\.\,;]?)/gms
It will restart at each paragraph, and it will identify pairs of quotes (and will also allow you to check that the spacing is correct before and after the quotes, if that's useful).
Numbered element identification
1 non-white-space before quote quote
2 white-space after leading quote
5 white-space before trailing quote
6 trailing quote (or double-newline, i.e. start of a paragraph
7 character after trailing quote if not whitespace or right paren
I think it would be reasonable to extend this for your other cases (I just haven't had the need to yet.)
It's javascript syntax. It's pretty fast, but I haven't done more optimizing than my "good enough". It will do a, say, 400 page book in about a second. I think it would be hard to match its speed procedurally.