问题
I am trying to run the vowpal wabbit on a set of files(approximately 10 as of now). My experiment is as follows:
Convert the first train file to VW format
Train the VW model with this first training file and store the model.
Validate the accuracy on the test file with stored model
Now take the second file convert it to VW format and retrain the model stored in step 2 with this second file and store the updated model
Validate the test file on retrained model and report the accuracy.
Repeat steps 4-5 for remaining set of files using for loop(test file is same in each iteration)
When I did this experiment I got some error. Here I am pasting train, retrain and validation commands as well error.
Can any of you please helps me in reproducing this scenario without getting any error.
Commands:
here 'i' is ranging from 1 to 10
$idec = i -1(index of previous model)
vw -d ${i}_processed_binary_compressed.vw --loss_function logistic -i ${idec}_processed_binary_compressed.model.vw --quiet --save_resume -f ${i}_processed_binary_compressed.model.vw
echo echo "Model trainiing completed for day_$i"
echo "${i}_day model validation is under progress..." echo
vw 10_processed_binary_compressed_test.vw -t -i ${i}_processed_binary_compressed.model.vw --quiet --hash strings -p 10_processed_binary_compressed_test_${i}_day_result.csv -r 10_processed_binary_compressed_test_${i}_day_raw.txt
error:
vw: option '--data' cannot be specified more than once
回答1:
I cannot replicate the problem (but TOC_cmi asked to paste the commonads I used):
git clone https://github.com/JohnLangford/vowpal_wabbit.git
cd vowpal_wabbit
make
cd test/train-sets
vw -d rcv1_smaller.dat --loss_function=logistic --save_resume -f day1.model
vw -d rcv1_small.dat --loss_function=logistic --save_resume -i day1.model -f day2.model
vw -t -d rcv1_smaller.dat --loss_function=logistic -i day2.model -p day2.predictions -r day2.raw
来源:https://stackoverflow.com/questions/27186351/how-to-retrain-the-model-for-sequence-of-files-in-vowpal-wabbit