How to detect similar Images in PHP?

后端 未结 3 2115
小鲜肉
小鲜肉 2021-02-11 01:19

I have many files of a same picture in various resolution, suitable for every devices like mobile, pc, psp etc. Now I am trying to display only unique pictures in the page, but

相关标签:
3条回答
  • 2021-02-11 01:30

    Firstly, your problem has hardly anything to do with PHP, so I have removed that tag and added more relevant tags.


    Smartly doing it will not require NxN comparisions. You can use lots of heuristics, but first I would like to ask you:

    1. Are all the copies of one image exact resize of each other (is there some cropping done - matching cropped images to the original could be more difficult and time consuming)?

    2. Are all images generated (resized) using the same tool?

    3. What about parameters you have used to resize? For example, are all pictures for displaying on PSP in the same resolution?

    4. What is your estimate of how many unique images you have (i.e, how many copies of each picture there might be - on an average)?

    5. Do you have any kind of categorization already done. For example, are all mobile images in separate folder (or of different resolution than the PC images)? This alone could reduce the number of comparisons a lot, even if you do brute force otherwise.

    A very top level hint on why you don't need NxN comparisions: you can devise many different approximate hashes (for example, the distribution of high/low frequency jpeg coefficients) and group "potentially" similar images together. This can reduce the number of comparisions required by 10-100 times or even more depending on the quality of heuristic used and the data set. The hashing can even be done on parts of images. 30000 is not a very large number if you use right techniques.

    0 讨论(0)
  • 2021-02-11 01:31

    Install gd2 and lib puzzle in your server.

    Lib puzzle is astonishing and easy to play with it. Check this snippet

    <?php
    # Compute signatures for two images
    $cvec1 = puzzle_fill_cvec_from_file('img1.jpg');
    $cvec2 = puzzle_fill_cvec_from_file('img2.jpg');
    
    # Compute the distance between both signatures
    $d = puzzle_vector_normalized_distance($cvec1, $cvec2);
    
    # Are pictures similar?
    if ($d < PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD) {
      echo "Pictures are looking similar\n";
    } else {
      echo "Pictures are different, distance=$d\n";
    }
    
    # Compress the signatures for database storage
    $compress_cvec1 = puzzle_compress_cvec($cvec1);
    $compress_cvec2 = puzzle_compress_cvec($cvec2);
    
    0 讨论(0)
  • 2021-02-11 01:36

    Well, even thou there are quite a few algorithms to do that, i believe it would still be faster to do that manually. Download all the images feed them into something like windows live photo gallery or any other software which could match similar images. This will take you few hours, but implementing image matching algorithm could take far more. After that you could spend extra time on amending your current system to store everything in a DB. Fix cause of the problem, not it's symptoms.

    0 讨论(0)
提交回复
热议问题