Gensim mallet CalledProcessError: returned non-zero exit status

后端 未结 5 1563
忘掉有多难
忘掉有多难 2021-01-21 01:53

I\'m getting an error while trying to access gensims mallet in jupyter notebooks. I have the specified file \'mallet\' in the same folder as my notebook, but cant seem to acces

相关标签:
5条回答
  • 2021-01-21 02:21

    Update the path to:

    mallet_path = 'C:/mallet/mallet-2.0.8/bin/mallet.bat'

    and edit the notepad mallet.bat within the mallet 2.0.8 folder to:

    @echo off
    
    rem This batch file serves as a wrapper for several
    rem  MALLET command line tools.
    
    if not "%MALLET_HOME%" == "" goto gotMalletHome
    
    echo MALLET requires an environment variable MALLET_HOME.
    goto :eof
    
    :gotMalletHome
    
    set MALLET_CLASSPATH=C:\mallet\mallet-2.0.8\class;C:\mallet\mallet-2.0.8\lib\mallet-deps.jar
    set MALLET_MEMORY=1G
    set MALLET_ENCODING=UTF-8
    
    set CMD=%1
    shift
    
    set CLASS=
    if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
    if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
    if "%CMD%"=="import-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
    if "%CMD%"=="info" set CLASS=cc.mallet.classify.tui.Vectors2Info
    if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
    if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
    if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
    if "%CMD%"=="classify-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Classify
    if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.TopicTrainer
    if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
    if "%CMD%"=="evaluate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
    if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
    if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
    if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
    if "%CMD%"=="run" set CLASS=%1 & shift
    
    if not "%CLASS%" == "" goto gotClass
    
    echo Mallet 2.0 commands: 
    echo   import-dir        load the contents of a directory into mallet instances (one per file)
    echo   import-file       load a single file into mallet instances (one per line)
    echo   import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
    echo   info              get information about Mallet instances
    echo   train-classifier  train a classifier from Mallet data files
    echo   classify-dir      classify data from a single file with a saved classifier
    echo   classify-file     classify the contents of a directory with a saved classifier
    echo   classify-svmlight classify data from a single file in SVMLight format
    echo   train-topics      train a topic model from Mallet data files
    echo   infer-topics      use a trained topic model to infer topics for new documents
    echo   evaluate-topics   estimate the probability of new documents given a trained model
    echo   prune             remove features based on frequency or information gain
    echo   split             divide data into testing, training, and validation portions
    echo   bulk-load         for big input files, efficiently prune vocabulary and import docs
    echo Include --help with any option for more information
    
    
    goto :eof
    
    :gotClass
    
    set MALLET_ARGS=
    
    :getArg
    
    if "%1"=="" goto run
    set MALLET_ARGS=%MALLET_ARGS% %1
    shift
    goto getArg
    
    :run
    
    "C:\Program Files\Java\jdk-12\bin\java" -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
    
    :eof

    in command line these were helpful commands to figure out what was going on:

    notepad mallet.bat
    java
    C:\Program Files\Java\jdk-12\bin\java
    dir /OD
    cd %userdir%
    cd %userpath%
    cd\
    cd users
    cd your_username
    cd appdata\local\temp\2
    dir /OD

    the problem is with java not being installed correctly or with the path not including java and the mallet classpath not being defined correctly. More info here: https://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html . This solved my error hopefully it helps someone else :)

    0 讨论(0)
  • 2021-01-21 02:23

    For me, this was not an import or a path problem.

    I spent hours trying to solve it. Tried this solution and nothing worked.

    Looking to a previous sucessfull call I made to LDA Mallet, I noticed some parameters were not being set, then I made it like this:

    gensim.models.wrappers.LdaMallet(mallet_path=mallet_path, corpus=corpus, num_topics=num_topics, id2word=id2word, prefix='temp_file_', workers=4)

    I really hope it helps you. Finding a solution to this problem was a pain.

    0 讨论(0)
  • 2021-01-21 02:26

    In Jupyter Notebook with Python, I run a

    conda uninstall gensim
    conda install gensim
    

    in cmd as an administrator and restarted my kernel. Worked like charm after i spent horrendous hours online searching.

    0 讨论(0)
  • 2021-01-21 02:33

    I got the same problem. What I did was change the location of mallet folder to the c://new_mallet so it worked nicely

        import os
        os.environ.update({'MALLET_HOME': r'C:/new_mallet/mallet-2.0.8/'})
        mallet_path = 'C:/new_mallet/mallet-2.0.8/bin/mallet'  # update this path
        ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=10, id2word=id2word)
    
    0 讨论(0)
  • 2021-01-21 02:39

    Make sure you installed the Java Developers Kit (JDK).

    The credit goes to this another answer

    After installing the JDK, the following codes for the LDA Mallet worked like charm!

    import os
    from gensim.models.wrappers import LdaMallet
    
    os.environ.update({'MALLET_HOME':r'C:/mallet/mallet-2.0.8/'})
    mallet_path = r'C:/mallet/mallet-2.0.8/bin/mallet.bat'
    
    lda_mallet = LdaMallet(
            mallet_path,
            corpus = corpus_bow,
            num_topics = n_topics,
            id2word = dct,
        )
    
    0 讨论(0)
提交回复
热议问题