问题
Below is part of the log from training my VW model.
Why are some of these lines followed by h? You'll notice that's true of the "average loss" line in the summary at the end. I'm not sure what this means, or if I should care.
...
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 1 1.0 -1.0000 0.0000 15
0.500000 0.000000 2 2.0 1.0000 1.0000 15
1.250000 2.000000 4 4.0 -1.0000 1.0000 9
1.167489 1.084979 8 8.0 -1.0000 1.0000 29
1.291439 1.415389 16 16.0 1.0000 1.0000 45
1.096302 0.901166 32 32.0 -1.0000 -1.0000 21
1.299807 1.503312 64 64.0 -1.0000 1.0000 7
1.413753 1.527699 128 128.0 -1.0000 1.0000 11
1.459430 1.505107 256 256.0 -1.0000 1.0000 47
1.322658 1.185886 512 512.0 -1.0000 -1.0000 59
1.193357 1.064056 1024 1024.0 -1.0000 1.0000 69
1.145822 1.098288 2048 2048.0 -1.0000 -1.0000 5
1.187072 1.228322 4096 4096.0 -1.0000 -1.0000 9
1.093551 1.000031 8192 8192.0 -1.0000 -1.0000 67
1.041445 0.989338 16384 16384.0 -1.0000 -0.6838 29
1.107593 1.173741 32768 32768.0 1.0000 -1.0000 5
1.147313 1.187034 65536 65536.0 -1.0000 1.0000 7
1.078471 1.009628 131072 131072.0 -1.0000 -1.0000 73
1.004700 1.004700 262144 262144.0 -1.0000 1.0000 41 h
0.918594 0.832488 524288 524288.0 -1.0000 -1.0000 7 h
0.868978 0.819363 1048576 1048576.0 -1.0000 -1.0000 21 h
finished run
number of examples per pass = 152064
passes used = 10
weighted example sum = 1.52064e+06
weighted label sum = -854360
average loss = 0.809741 h
...
Thanks
回答1:
This h
is printed when
(!all.holdout_set_off && all.current_pass >= 1)
is true (see output from grep -nH -e '\<h\\n' vowpalwabbit/*.cc
and view the code).
Search for --holdout_off
in Command line arguments:
--holdout_off disables holdout validation for multiple pass learning. By default, VW holds out a (controllable default = 1/10th) subset of examples whenever --passes > 1 and reports the test loss on the print out. This is used to prevent overfitting in multiple pass learning. An extra h is printed at the end of the line to specify the reported losses are holdout validation loss, instead of progressive validation loss.
回答2:
VW uses samples from file to train model and prints out avg train loss value (without 'h' suffix) while that. In case several passes (specified with --passes n
) over file are needed to train the model it keeps every k'th example (could be changed with --holdout_period k
) for test and don't use them for training. On a second and further passes it estimates loss on these test examples rather than train examples and prints out loss value with ' h'. If you are getting very small values without 'h' and much bigger values with 'h' later that could mean your model is overfitted. If have already ensured your model don't overfit and want to use multiple passes over whole dataset for training you shall specify --holdout_off
. Otherwise you'll lost 10% of data (--holdout_period
is 10 by default).
来源:https://stackoverflow.com/questions/26661227/interpreting-vowpal-wabbit-results-why-are-some-lines-appended-by-h