How do I configure the pom.xml of Tika to stop getting all the license dependency warnings?

你说的曾经没有我的故事 提交于 2020-03-18 11:44:29

问题


I am getting all these warnings from Tika when I try to use it:

Feb 24, 2018 9:24:35 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.

Feb 24, 2018 9:24:35 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version.

I tried adding this (in Tika pom.xml):

            <dependency>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcprov-jdk15on</artifactId>
                <version>1.57</version>
            </dependency>
            <dependency>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcmail-jdk15on</artifactId>
                <version>1.57</version>
            </dependency>
            <dependency>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcpkix-jdk15on</artifactId>
                <version>1.57</version>
            </dependency>
            <dependency>
                <groupId>log4j</groupId>
                <artifactId>log4j</artifactId>
                <version>1.2.17</version>
            </dependency>

            <dependency>
                <groupId>com.levigo.jbig2</groupId>
                <artifactId>levigo-jbig2-imageio</artifactId>
                <version>2.0</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>com.github.jai-imageio</groupId>
                <artifactId>jai-imageio-core</artifactId>
                <version>1.3.1</version>
                <scope>test</scope>
            </dependency>    
            <dependency>
                <groupId>com.github.jai-imageio</groupId>
                <artifactId>jai-imageio-jpeg2000</artifactId>
                <version>1.3.0</version>
                <scope>test</scope>
            </dependency>

            <dependency>
                    <groupId>org.xerial</groupId>
                    <artifactId>sqlite-jdbc</artifactId>
                    <version>3.20.1</version>
            </dependency>

But I still get the same warnings.

How do I resolve this?

UPDATE 1

My dependencies were added here: https://github.com/apache/tika/blob/1.17/pom.xml#L164-L170

Also I did try without the set to test. It did not do anything.

The dependencies that I added seemed to be for PDFBox a Tika dependency.


回答1:


I added the following dependencies and I didn't have any other warning

    <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-core</artifactId>
        <version>1.18</version>
    </dependency>
    <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-parsers</artifactId>
        <version>1.18</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>jbig2-imageio</artifactId>
        <version>3.0.1</version>
    </dependency>
    <dependency>
        <groupId>com.github.jai-imageio</groupId>
        <artifactId>jai-imageio-jpeg2000</artifactId>
        <version>1.3.0</version>
    </dependency>



回答2:


Its hard to see exactly what is happening because you did not include your entire <dependencies>...</dependencies> section of your pom.xml, but I suspect it is due to optional maven dependencies. According to maven docs, you need to declare optional dependencies in your pom or they will not be loaded.

Additionally, all of your imageio dependencies are all have <scope>test</scope> making them only usable during unit testing.




回答3:


For Clojure visitors: I fixed it with:

(System/setProperty "tika.config" "tika-config.xml")

in my config.clj file. The xml is just:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
   <service-loader initializableProblemHandler="ignore"/>
</properties>

this xml file is in the "resources" dir and that dir must be in your path.




回答4:


this is now documented in the error log:

Feb 19, 2019 3:18:44 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.

However I'd prefer to have a version of Tika (e.g., with a classifier) which does not include OCR/image processing when I only want to parse text, or have an option to turn off the error logging (and only log an error when actually trying to load an unsupported format).



来源:https://stackoverflow.com/questions/48970160/how-do-i-configure-the-pom-xml-of-tika-to-stop-getting-all-the-license-dependenc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!