Convert a pdf file to text in C# [closed]

这一生的挚爱 提交于 2020-01-12 17:12:59

问题


I need to convert a .pdf file to a .txt file (or .doc, but I prefer .txt).

How can I do this in C#?


回答1:


Ghostscript could do what you need. Below is a command for extracting text from a pdf file into a txt file (you can run it from a command line to test if it works for you):

gswin32c.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "test.pdf" -c quit >"test.txt"

Check here: codeproject: Convert PDF to Image Using Ghostscript API for details on how to use ghostscript with C#




回答2:


I've had the need myself and I used this article to get me started: http://www.codeproject.com/KB/string/pdf2text.aspx




回答3:


The concept of converting PDF to text is not really straight forward and you wont see anyone posting a code here that will convert PDF to text straight. So your best bet now is to use a library that would do the job for you... a good one is PDFBox, you can google it. You'll probably find it written in java but fortunately you can use IKVM to convert it to .Net....




回答4:


As an alternative to Don's solution there I found the following:

Extract Text from PDF in C# (100% .NET)




回答5:


Docotic.Pdf library can extract text from PDF files (formatted or not).

Here is a sample code that shows how to extract formatted text from a PDF file and save it to an other file.

public static void ExtractFormattedText(string pdfFile, string textFile)
{
    using (PdfDocument doc = new PdfDocument(pdfFile))
    {
        string text = doc.GetTextWithFormatting();
        File.WriteAllText(textFile, text);
    }
}

Also, there is a sample on our site that shows other options for extraction of text from PDF files.

Disclaimer: I work for Bit Miracle, vendor of the library.




回答6:


    public void PDF_TEXT()
    {
        richTextBox1.Text =  string.Empty;

        ReadPdfFile(@"C:\Myfile.pdf");  //read pdf file from location
    }


    public void ReadPdfFile(string fileName)
    {

 string strText = string.Empty;
 StringBuilder text = new StringBuilder();
   try
    {
    PdfReader reader = new PdfReader((string)fileName);
    if (File.Exists(fileName))
    {
    PdfReader pdfReader = new PdfReader(fileName);

   for (int page = 1; page <= pdfReader.NumberOfPages; page++)
      {

 ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

 string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

          text.Append(currentText);

                }
                pdfReader.Close();
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.Message);
        }
        richTextBox1.Text = text.ToString();

    }



    private void Save_TextFile_Click(object sender, EventArgs e)
    {
        SaveFileDialog sfd = new SaveFileDialog();

        DialogResult messageResult = MessageBox.Show("Save this file into Text?", "Text File", MessageBoxButtons.OKCancel);

        if (messageResult == DialogResult.Cancel)
        {

        }
        else
        {
            sfd.Title = "Save As Textfile";
            sfd.InitialDirectory = @"C:\";
            sfd.Filter = "TextDocuments|*.txt";


            if (sfd.ShowDialog() == DialogResult.OK)
            {
                if (richTextBox1.Text != "")
                {
                    richTextBox1.SaveFile(sfd.FileName, RichTextBoxStreamType.PlainText);
                    richTextBox1.Text = "";
                    MessageBox.Show("Text Saved Succesfully", "Text File");

                }
                else
                {
                    MessageBox.Show("Please Upload Your Pdf", "Text File",
                    MessageBoxButtons.OKCancel, MessageBoxIcon.Asterisk);
                }

            }

        }

    }


来源:https://stackoverflow.com/questions/1944576/convert-a-pdf-file-to-text-in-c-sharp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!