问题
I am trying to get the content of pdf file using itextsharp
as you can see :
static void Main(string[] args)
{
StringBuilder text = new StringBuilder();
using (PdfReader reader = new PdfReader(@"D:\a.pdf"))
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
}
System.IO.File.WriteAllText(@"c:/a.txt",text.ToString());
Console.ReadLine();
}
My pdf content is written in Persian
,and after running the above code to result is like this :
But this is not correct result.should i set any option in itextsharp
回答1:
It is hard to say without an original file but in case you have characters/words incorrectly placed then you should try to use LocationTextExtractionStrategy
like this:
text.Append(PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
来源:https://stackoverflow.com/questions/35436158/itextsharp-cant-extract-pdf-unicode-content-in-c-sharp