vowpalwabbit

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

夙愿已清 提交于 2019-12-17 17:25:27
问题 I am trying to use vowpal wabbit for logistic regression. I am not sure if this is the right syntax to do it For training, I do ./vw -d ~/Desktop/new_data.txt --passes 20 --binary --cache_file cache.txt -f lr.vw --loss_function logistic --l1 0.05 For testing I do ./vw -d ~/libsvm-3.18_test/matlab/new_data_test.txt --binary -t -i lr.vw -p predictions.txt -r raw_score.txt Here is a snippet from my train data -1:1.00038 | 110:0.30103 262:0.90309 689:1.20412 1103:0.477121 1286:1.5563 2663:0.30103

how to retrain the model for sequence of files in vowpal wabbit

[亡魂溺海] 提交于 2019-12-13 03:42:44
问题 I am trying to run the vowpal wabbit on a set of files(approximately 10 as of now). My experiment is as follows: Convert the first train file to VW format Train the VW model with this first training file and store the model. Validate the accuracy on the test file with stored model Now take the second file convert it to VW format and retrain the model stored in step 2 with this second file and store the updated model Validate the test file on retrained model and report the accuracy. Repeat

Vowpal Wabbit multiple class classification predict probabilities

狂风中的少年 提交于 2019-12-13 02:13:44
问题 I am trying to do multiple classification problem with Vowpal Wabbit. I have a train file that look like this: 1 |feature_space 2 |feature_space 3 |feature_space As an output I want to get probabilities of test item belonging to each class, like this: 1: 0.13 2:0.57 3:0.30 think of sklearn classifiers predict_proba methods, for example. I've tried the following: 1) vw -oaa 3 train.file -f model.file --loss_function logistic --link logistic vw -p predict.file -t test.file -i model.file -raw

Read data from memory in Vowpal Wabbit?

一曲冷凌霜 提交于 2019-12-12 07:38:31
问题 Is there a way to send data to train a model in Vowpal Wabbit without writing it to disk? Here's what I'm trying to do. I have a relatively large dataset in csv (around 2gb) which fits in memory with no problem. I load it in R into a data frame, and I have a function to convert the data in that dataframe into VW format. Now, in order to train a model, I have to write the converted data to a file first, and then feed that file to VW. And the writing to disk part takes way too long, especially

Vowpal Wabbit: Cannot retrieve latent factors with gd_mf_weights from a trained --rank model

≡放荡痞女 提交于 2019-12-11 15:23:21
问题 I trained a rank 40 model on the movielens data, but cannot retrieve the weights from the trained model with gd_mf_weights. I'm following the syntax from the VW matrix factorization example but it is giving me errors. Please advise. Model training call: vw --rank 40 -q ui --l2 0.1 --learning_rate 0.015 --decay_learning_rate 0.97 --power_t 0 --passes 50 --cache_file movielens.cache -f movielens.reg -d train.vw Weights generating call: library/gd_mf_weights -I train.vw -O '/data/home/mlteam

Can't install Vowpalwabbit using pip on Windows 10

笑着哭i 提交于 2019-12-11 05:24:38
问题 I have python 3.7.0 installed on Windows 10 and I can't install Vowpalwobbit. When I use command: pip install vowpalwabbit I get: Building wheels for collected packages: vowpalwabbit Building wheel for vowpalwabbit (setup.py) ... error ERROR: Complete output from command 'c:\users\user\appdata\local\programs\python\python37-32\python.exe' -u -c 'import setuptools, tokenize;__file__='"'"'C:\\Users\\User\\AppData\\Local\\Temp\\pip-install-0tp3npd1\\vowpalwabbit\\setup.py'"'"';f=getattr(tokenize

why normalizing feature values doesn't change the training output much?

好久不见. 提交于 2019-12-10 18:25:01
问题 I have 3113 training examples, over a dense feature vector of size 78. The magnitude of features is different: some around 20, some 200K. For example, here is one of the training examples, in vowpal-wabbit input format. 0.050000 1 '2006-07-10_00:00:00_0.050000| F0:9.670000 F1:0.130000 F2:0.320000 F3:0.570000 F4:9.837000 F5:9.593000 F6:9.238150 F7:9.646667 F8:9.631333 F9:8.338904 F10:9.748000 F11:10.227667 F12:10.253667 F13:9.800000 F14:0.010000 F15:0.030000 F16:-0.270000 F17:10.015000 F18:9

Interpreting Vowpal Wabbit results: Why are some lines appended by “h”?

不问归期 提交于 2019-12-10 14:05:55
问题 Below is part of the log from training my VW model. Why are some of these lines followed by h? You'll notice that's true of the "average loss" line in the summary at the end. I'm not sure what this means, or if I should care. ... average since example example current current current loss last counter weight label predict features 1.000000 1.000000 1 1.0 -1.0000 0.0000 15 0.500000 0.000000 2 2.0 1.0000 1.0000 15 1.250000 2.000000 4 4.0 -1.0000 1.0000 9 1.167489 1.084979 8 8.0 -1.0000 1.0000 29

Interpreting basic output from Vowpal Wabbit

亡梦爱人 提交于 2019-12-10 13:55:13
问题 I had a couple questions about the output from a simple run of VW. I have read around the internet and the wiki sites but am still unsure about a couple of basic things. I ran the following on the boston housing data: vw -d housing.vm --progress 1 where the housing.vm file is set up as (partially): and output is (partially): Question 1: 1) Is it correct to think about the average loss column as the following steps: a) predict zero, so the first average loss is the squared error of the first

How to use vowpal wabbit for online prediction (streaming mode)

冷暖自知 提交于 2019-12-08 00:48:32
I am trying to use Vowpal Wabbit for one multi class classification task with 154 different class labels as follows: Trained VW model with large amount of data. Tested the model with one dedicated test set. In this scenario I was able to hit >80% result, which is good. But the problem which currently I am working on is: I have to replicate the real time prediction scenario. In this case I have to pass one data point (i.e text line) at a time so that model can predict the value and output. I have tried out all the options which I knew but failed. Can any of you let me know how to create a real