I've tried lots of things but still fail when I'm trying to install textract package on my Windows by using pip command.
I'm getting the following error:
I have no idea what to do, so I'll be really grateful for any advice. Thank you
Stolen from here:
Needed to first install swig from conda (miniconda)
conda install swig
Then downloaded the EbookLib 0.15 zip from the releases
https://github.com/aerkalov/ebooklib/releases
After unzipping it, I manually removed (I used notepad++) the unicode char in the README.md file. (unicode char is on Line 44)
And then installed the module with pip.
cd to_unzipped_folder_path_here
pip install .
And finally
pip install textract
(Windows 10, Python 3.7) I had more issues than others, but this builds off of previous answers :
Make sure that Microsoft Visual Studio C++ Compiler for Python is installed
- For Visual Studio C++ 14.0 (also required by Scrapy as of June 2019),
use : https://wiki.python.org/moin/WindowsCompilers -->
https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2017 --> https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&rel=16 Note : This may take a very long time to install, so be patient
- For Visual Studio C++ 14.0 (also required by Scrapy as of June 2019),
use : https://wiki.python.org/moin/WindowsCompilers -->
python -m pip install --upgrade pip setuptools wheel
pip install six --upgrade
Download EbookLib version 0.15:
- Unzip the .zip file To avoid encoding errors, edit the "long_description" variable assignment to be "long_description = open('README.md',encoding="utf-8").read(),"
Download Swig:
- http://www.swig.org/download.html
- Unzip the .zip file
- Copy the swig.exe file into the Python path : e.g. "C:\Users\username\AppData\Local\Programs\Python\Python37"
- Copy the "typemaps" folder into the python "Lib" folder : e.g. "C:\Program Files\swigwin-4.0.0\Lib\typemaps" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
- Copy the "*.swg" files to the python "Lib" folder : e.g. "C:\Program Files\swigwin-4.0.0\Lib*.swg" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
- Copy the all swig python files to the python "Lib" folder : e.g. "C:\Program Files\swigwin-4.0.0\Lib\python*" --> "C:\Users\username\AppData\Local\Programs\Python\Python37\Lib\"
cd into the unzipped Ebooklib folder from the prompt : e.g. C:> cd "C:\Users\username\Desktop\ebooklib-0.15"
run the installation for EbookLib : pip install .
run the textract installation : pip install textract
The output should be :
C:\Users\username\Desktop\ebooklib-0.15>pip install textract
Collecting textract
Requirement already satisfied: docx2txt==0.6 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.6)
Requirement already satisfied: beautifulsoup4==4.5.3 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (4.5.3)
Requirement already satisfied: EbookLib==0.15 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.15)
Requirement already satisfied: xlrd==1.0.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.0.0)
Requirement already satisfied: SpeechRecognition==3.6.3 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (3.6.3)
Requirement already satisfied: six==1.10.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.10.0)
Collecting pocketsphinx==0.1.3 (from textract)
Using cached https://files.pythonhosted.org/packages/93/5f/a968e5d53d25e32deb78c3e169fd8612ecf53cc76e32cb40e19be35696af/pocketsphinx-0.1.3.tar.bz2
Requirement already satisfied: chardet==2.3.0 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (2.3.0)
Requirement already satisfied: argcomplete==1.8.2 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (1.8.2)
Requirement already satisfied: python-pptx==0.6.5 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from textract) (0.6.5)
Requirement already satisfied: lxml in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from EbookLib==0.15->textract) (4.3.3)
Requirement already satisfied: XlsxWriter>=0.5.7 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from python-pptx==0.6.5->textract) (1.1.8)
Requirement already satisfied: Pillow>=2.6.1 in c:\users\username\appdata\local\programs\python\python37\lib\site-packages (from python-pptx==0.6.5->textract) (6.0.0)
Building wheels for collected packages: pocketsphinx
Building wheel for pocketsphinx (setup.py) ... done
Stored in directory: C:\Users\username\AppData\Local\pip\Cache\wheels\38\80\4f\ddc3e8c2b788f2c7f1d625ae870f6bafd3038ff04a3445a2f8
Successfully built pocketsphinx
Installing collected packages: pocketsphinx, textract
Successfully installed pocketsphinx-0.1.3 textract-1.6.1
C:\Users\username\Desktop\ebooklib-0.15>
At the time of this writing, jsonschema will have conflicting dependencies with textract. The following errors also arose as I tried to figure out the proper installation :
ERROR: requests 2.22.0 has requirement chardet<3.1.0,>=3.0.2, but you'll have chardet 2.3.0 which is incompatible.
ERROR: camelot-py 0.7.2 has requirement chardet>=3.0.4, but you'll have chardet 2.3.0 which is incompatible.
ERROR: Command "python setup.py egg_info" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-install-msmb9od3\EbookLib\
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1671: character maps to <undefined>
error: command 'C:\\Users\\username\\AppData\\Local\\Programs\\Python\\Python37\\swig.exe' failed with exit status 1
ERROR: Failed building wheel for pocketsphinx
error: command 'swig.exe' failed: No such file or directory
(1) : Error: Unable to find 'swig.swg'
(3) : Error: Unable to find 'python.swg'
The solution is much simpler now that the project appears to have been taken over by another individual (recently started updating the project again as of 3 months ago when I wrote this answer.)
You can now go to https://github.com/deanmalmgren/textract/releases and download v1.6.2
which provides only requirement updates over v1.6.1 (fixing the unicode debug error) or v1.6.3
which is the latest (as of writing this.)
Once downloaded, extract, cd [folder extracted to]
and pip install .
Just keep in mind there is always the concern that as requirements are updated malicious code can be inserted into dependencies and update this at your own risk.
Not the most elegant solution but it works!
pip install git+https://github.com/jpweytjens/textract
Thanks to jpweytjens
来源:https://stackoverflow.com/questions/50743723/cant-install-textract-on-windows