Hidden Markov Model Training for Dynamic Gestures?

问题

I know there is a lot of material related to hidden markov model and I have also read all the questions and answers related to this topic. I understand how it works and how it can be trained, however I am not able to solve the following problem I am having when trying to train it for a simple dynamic gesture.

I am using HMM implementation for OpenCV I have looked into previously asked questions and answer here. Which has really helped me in understanding and using markov models.

I have total of two dynamic gestures, which are both symmetric (swipe left and swipe right) There are total of 5 observations in which 4 are the different stages in the gesture and 5th one is an observation when non of these stages are occuring.

Swipe left gesture consists of the following observation: 1->2->3->4 (which should trigger a swipe left state) Likewise Swipe Right gesture consists of the following observation: 4->3->2->1

I have 25 sequences. I am taking 20 observations for each of the sequence, which are used to train hidden markov model using Baum-Welch algorithm.

The following is the input sequence:

1 0 1 1 0 2 2 2 2 0 0 2 3 3 3 0 0 4 4 4 
4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 0 1 
4 4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 0 
4 4 4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 
1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 4 4 
1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 4 
0 1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 
0 0 1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 
4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 1 1 
4 4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 1 
4 4 4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 
1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 4 4 
1 1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 4 
1 1 1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 
1 3 4 4 4 0 3 0 0 0 0 0 3 2 0 0 1 1 1 1

In this sequence you can see pattern for Swipe left and Swipe right gestures.

To train the hidden markov model I am initilizing it with the following values and then calling the train function to get the output:

TRANS:
0.7 0.15 0.15
0.3 0.4 0.3
0.3 0.4 0.3

EMIS:
0.3 0.1 0.1 0.1 0.1
0.2 0.1 0.2 0.2 0.3
0.2 0.3 0.2 0.2 0.1

INIT:
0.6 0.2 0.2

After training the output is:

TRANS:
0.81611 0.0847926 0.0990979
0.398458 0.346433 0.255109
0.371391 0.35587 0.272739

EMIS:
0.534127 0.125568 0.0824495 0.200169 0.0576869
0.294653 0.0250053 0.0500311 0.200616 0.429694
0.238808 0.075001 0.0500019 0.130455 0.505733

INIT:
0.443984 0.391323 0.164693

Using this model in my recognition program, I am not getting results. I want the system to remain in a NULL STATE unless one of the gesture is detected. In the Transition and Emission matrix I gave my guess values for both these gesture.

What do you think I might be doing wrong? Any pointers or help?

Lastly here is the code I am using for doing this (if anyone wants to have a look)

double TRGUESSdata[] = {0.7, 0.15, 0.15,
                            0.3, 0.4, 0.3,
                            0.3, 0.4, 0.3};
    cv::Mat TRGUESS = cv::Mat(3,3,CV_64F,TRGUESSdata).clone();
    double EMITGUESSdata[] = {0.3, 0.1, 0.1, 0.1, 0.1,
                              0.2, 0.1, 0.2, 0.2, 0.3,
                              0.2, 0.3, 0.2, 0.2, 0.1};
    cv::Mat EMITGUESS = cv::Mat(3,5,CV_64F,EMITGUESSdata).clone();
    double INITGUESSdata[] = {0.6 , 0.2 , 0.2};
    cv::Mat INITGUESS = cv::Mat(1,3,CV_64F,INITGUESSdata).clone();
    std::cout << seq.rows << " "  << seq.cols << std::endl;
    int a = 0;
    std::ifstream fin;
    fin.open("observations.txt");

    for(int y =0; y < seq.rows; y++)
    {
        for(int x = 0; x<seq.cols ; x++)
        {

            fin >> a;
            seq.at<signed int>(y,x) = (signed int)a;
            std::cout << a;
        }
        std::cout << std::endl;
    }

     hmm.printModel(TRGUESS,EMITGUESS,INITGUESS);
    hmm.train(seq,1000,TRGUESS,EMITGUESS,INITGUESS);
    hmm.printModel(TRGUESS,EMITGUESS,INITGUESS);

Here fin is used to read the observation I have from my other code.

回答1:

What does the 0 mean in your model ? It seems to me in your data there are no direct transitions for both states, it always goes back to the state 0. Try something like the following in your data for a state transition sequence.

1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4
1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 0 0 0 0 0
4 4 3 3 2 2 1 1 0 0 0 0 0 0 0 0 0
4 4 4 3 3 3 2 2 2 2 2 1 1 1 1 1 1

As a general rule:

I would recommend to work with openCV only after you have a proof of concept in Matlab/octave. This has two reasons. First of all you know exactly what you want to do and how it works, and don't waste your time implementing and debugging your theory in a 'low' level language (compared to matlab). Debugging algorithms in openCV is really time-consuming.

Secondly after you know your stuff works as expected, if you implement it and hit a bug (of openCV or C++, python) you know it's not your theory, not your implementation, it's the framework. It happened to me already two times that employed computer scientists implemented directly from a paper (after being told not to do so), spending 80% of the remaining time to debug the algorithm without ANY success only to find out that: they didn't really get the theory or some submodule of openCV had a slight bug which degenerated their results.

The link you've mentioned uses a HMM toolbox in matlab. Try to implement and understand your problem there, it's really worth spending the time. Not only you can verify each step for correctness, you can use the itermediate matrices with your openCV code after you have a working model.

来源：https://stackoverflow.com/questions/12015009/hidden-markov-model-training-for-dynamic-gestures

标签

c++

OpenCV

gesture-recognition

hidden-markov-models