facial expression classification in real time using SVM

前端 未结 3 1919
-上瘾入骨i
-上瘾入骨i 2021-02-01 10:41

I am currently working on a project where I have to extract the facial expression of a user (only one user at a time from a webcam) like sad or happy.

My method for clas

3条回答
  •  无人及你
    2021-02-01 11:30

    if you are already using opencv,i suggest you use the built in svm implementation, training/saving/loading in python is as follow. c++ has corresponding api to do the same in about the same amount of code. it also has 'train_auto' to find best parameters

    import numpy as np
    import cv2
    
    samples = np.array(np.random.random((4,5)), dtype = np.float32)
    labels = np.array(np.random.randint(0,2,4), dtype = np.float32)
    
    svm = cv2.SVM()
    svmparams = dict( kernel_type = cv2.SVM_LINEAR, 
                           svm_type = cv2.SVM_C_SVC,
                           C = 1 )
    
    svm.train(samples, labels, params = svmparams)
    
    testresult = np.float32( [svm.predict(s) for s in samples])
    
    print samples
    print labels
    print testresult
    
    svm.save('model.xml')
    loaded=svm.load('model.xml')
    

    and output

    #print samples
    [[ 0.24686454  0.07454421  0.90043277  0.37529686  0.34437731]
     [ 0.41088378  0.79261768  0.46119651  0.50203663  0.64999193]
     [ 0.11879266  0.6869216   0.4808321   0.6477254   0.16334397]
     [ 0.02145131  0.51843268  0.74307418  0.90667248  0.07163303]]
    #print labels
    [ 0.  1.  1.  0.]
    #print testresult
    [ 0.  1.  1.  0.]    
    

    so you provide the n flattened shape models as samples and n labels and you are good to go. you probably dont even need the asm part, just apply some filters which are sensitive to orientation like sobel or gabor and concatenate the matrices and flatten them then feed them directly to svm. you probably can get maybe 70-90% accuracy.

    as someone said cnn are an alternative to svms.here's some links that implement lenet5. so far,i find svms much simpler to get started.

    https://github.com/lisa-lab/DeepLearningTutorials/

    http://www.codeproject.com/Articles/16650/Neural-Network-for-Recognition-of-Handwritten-Digi

    -edit-

    landmarks are just n (x,y) vectors right? so why dont you try put them into a array of size 2n and simply feed them directly to the code above?

    for example,3 training samples of 4 land marks (0,0),(10,10),(50,50),(70,70)

    samples = [[0,0,10,10,50,50,70,70],
    [0,0,10,10,50,50,70,70],
    [0,0,10,10,50,50,70,70]]
    
    labels=[0.,1.,2.]
    

    0=happy

    1=angry

    2=disgust

提交回复
热议问题