问题
I am working on a project based on the facial recognition and verification. I am using Siamese network to get the 128 vector of the face ( Embeddings ).
I am storing the encodings/embeddings of the person's face in the database and then checking or say matching the incoming face's encodings with the previously stored encodings to recognize the person.
To make a robust system, I have to store more than one encodings of the same person. When I have used only a single encoding vector, and matched with :
From face_recognition
library (to get distance):
face_recognition.compare_faces( stored_list_of_encodings, checking_image_encodings )
That doesn't work all the time because I have only compared with a single encoding. To make a system sufficient for most cases, I want to store minimum 3 encodings of a same person and then compare with the new data.
Now the question: How to store multiple embeddings of a same person and then compare the distance?
I am using face_recognition
as the library and Siamese Network for feature extraction.
回答1:
Have you considered using an SVM classifier to classify the faces? So the input to the SVM classifier would be the vector of size 128. You can then compile a few of the vectors belonging to the face of a single person (3 in your case) and fit it to an SVM as a class. You can then do the same for different faces (classes).
Then, when predicting a face, simply feed in the new vector and run
svm.predict([..])
I had a similar use-case for my project, but I was using Facenet instead as the feature extractor. Works perfectly.
回答2:
You can store all the face embeddings in a database/datastructure which supports nearest neighbors queries and then for any given face you ought to find a match, get it's nearest neighbors embeddings in the database. With k nearest neighbors and their distances to query item, you can decide which person this new face belongs to (if it belongs to known persons at all).
You can take a look at Approximate Nearest Neighbor Benchmark for available options.
Just remember, they are called approximate so you won't get exact results but it is the best option you have if you are dealing with a large amount of entities. If it's not the case with you, you can just use brute force nearest neighbors solutions already provided in sklearn to get exact matches.
回答3:
Theres multiple approaches to this, i have worked on Face recognition quite extensively and there are a few things that i tried. You can do some of the following.
Create a KNN classifier
The way to do this is to create a db of sorts, where each feature has a person name associated with it (in this case a feature is representative of one face image of a person). Then at comparison time, you compute the distance of your query feature with each representation. You take the comparisons with the N smallest distances. You can then go through the N distances and see what classes each belong to, and you can then use the maximum occurring label, and this will be your target class. In my experience though this isn't extremely robust (though this is entirely dependent on the type of your test data, mine had to do with alot of in the wild images, so this wasnt robust enough)
Averaged Representations
Another approach that i used was that i averaged the representations for each person. If i had 5 images, i would take the mean or the median of the 5 representations extracted from those representations. In my experience median worked better than mean. You will now have an average representation affiliated with each person, you can just take the distance with each average representation, and the one with the least distance will be your target class.
Cluster Representations
Another approach is to cluster representation into clusters using DBScan, and then at runtime classify the query rep into a cluster and take the majority class in that cluster as the label
In my experience average representation is the best, but you do end up needing multiple images, at-least 5 i think. But in my case i needed at-least 5 since i was catering to multiple angles and what not.
NOTE :: SVM is a BAD approach, you limit your DB size, and every-time you need to add a new person to the DB you would need to train a new SVM for the extra class that has just popped up
Also, for a storing purposes you could always store it in a JSON
来源:https://stackoverflow.com/questions/59616113/how-to-store-multiple-features-for-face-and-find-distance