Can't install textract on windows

狂风中的少年 提交于 2019-11-28 00:07:46

Stolen from here:

Needed to first install swig from conda (miniconda)

conda install swig

Then downloaded the EbookLib 0.15 zip from the releases

https://github.com/aerkalov/ebooklib/releases

After unzipping it, I manually removed (I used notepad++) the unicode char in the README.md file. (unicode char is on Line 44)

And then installed the module with pip.

cd to_unzipped_folder_path_here
pip install .

And finally

pip install textract

(Windows 10, Python 3.7) I had more issues than others, but this builds off of previous answers :

  1. Make sure that Microsoft Visual Studio C++ Compiler for Python is installed

  2. python -m pip install --upgrade pip setuptools wheel

  3. pip install six --upgrade

  4. Download EbookLib version 0.15:

    • Unzip the .zip file To avoid encoding errors, edit the "long_description" variable assignment to be "long_description = open('README.md',encoding="utf-8").read(),"
  5. Download Swig:

    • http://www.swig.org/download.html
    • Unzip the .zip file
    • Copy the swig.exe file into the Python path : e.g. "C:\Users\username\AppData\Local\Programs\Python\Python37"
    • Copy the "typemaps" folder into the python "Lib" folder : e.g. "C:\Program Files\swigwin-4.0.0\Lib\typemaps" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
    • Copy the "*.swg" files to the python "Lib" folder : e.g. "C:\Program Files\swigwin-4.0.0\Lib*.swg" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
    • Copy the all swig python files to the python "Lib" folder : e.g. "C:\Program Files\swigwin-4.0.0\Lib\python*" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
  6. cd into the unzipped Ebooklib folder from the prompt : e.g. C:> cd "C:\Users\username\Desktop\ebooklib-0.15"

  7. run the installation for EbookLib : pip install .

  8. run the textract installation : pip install textract

The output should be :

C:\Users\username\Desktop\ebooklib-0.15>pip install textract
Collecting textract
Requirement already satisfied: docx2txt==0.6 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.6)
Requirement already satisfied: beautifulsoup4==4.5.3 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (4.5.3)
Requirement already satisfied: EbookLib==0.15 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.15)
Requirement already satisfied: xlrd==1.0.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.0.0)
Requirement already satisfied: SpeechRecognition==3.6.3 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (3.6.3)
Requirement already satisfied: six==1.10.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.10.0)
Collecting pocketsphinx==0.1.3 (from textract)
  Using cached https://files.pythonhosted.org/packages/93/5f/a968e5d53d25e32deb78c3e169fd8612ecf53cc76e32cb40e19be35696af/pocketsphinx-0.1.3.tar.bz2
Requirement already satisfied: chardet==2.3.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (2.3.0)
Requirement already satisfied: argcomplete==1.8.2 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.8.2)
Requirement already satisfied: python-pptx==0.6.5 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.6.5)
Requirement already satisfied: lxml in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from EbookLib==0.15->textract) (4.3.3)
Requirement already satisfied: XlsxWriter>=0.5.7 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from python-pptx==0.6.5->textract) (1.1.8)
Requirement already satisfied: Pillow>=2.6.1 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from python-pptx==0.6.5->textract) (6.0.0)
Building wheels for collected packages: pocketsphinx
  Building wheel for pocketsphinx (setup.py) ... done
  Stored in directory: C:\Users\username\AppData\Local\pip\Cache\wheels\38\80\4f\ddc3e8c2b788f2c7f1d625ae870f6bafd3038ff04a3445a2f8
Successfully built pocketsphinx
Installing collected packages: pocketsphinx, textract
Successfully installed pocketsphinx-0.1.3 textract-1.6.1

C:\Users\username\Desktop\ebooklib-0.15>

At the time of this writing, jsonschema will have conflicting dependencies with textract. The following errors also arose as I tried to figure out the proper installation :

ERROR: requests 2.22.0 has requirement chardet<3.1.0,>=3.0.2, but you'll have chardet 2.3.0 which is incompatible.
ERROR: camelot-py 0.7.2 has requirement chardet>=3.0.4, but you'll have chardet 2.3.0 which is incompatible.

ERROR: Command "python setup.py egg_info" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-install-msmb9od3\EbookLib\
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1671: character maps to <undefined>
error: command 'C:\\Users\\username\\AppData\\Local\\Programs\\Python\\Python37\\swig.exe' failed with exit status 1

ERROR: Failed building wheel for pocketsphinx
error: command 'swig.exe' failed: No such file or directory
  (1) : Error: Unable to find 'swig.swg'
  (3) : Error: Unable to find 'python.swg'

The solution is much simpler now that the project appears to have been taken over by another individual (recently started updating the project again as of 3 months ago when I wrote this answer.)

You can now go to https://github.com/deanmalmgren/textract/releases and download v1.6.2 which provides only requirement updates over v1.6.1 (fixing the unicode debug error) or v1.6.3 which is the latest (as of writing this.)

Once downloaded, extract, cd [folder extracted to] and pip install .

Just keep in mind there is always the concern that as requirements are updated malicious code can be inserted into dependencies and update this at your own risk.

Not the most elegant solution but it works!

pip install git+https://github.com/jpweytjens/textract

Thanks to jpweytjens

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!