Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

前端 未结 23 1215
后悔当初
后悔当初 2020-11-22 09:26

One of the most interesting projects I\'ve worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recogni

相关标签:
23条回答
  • 2020-11-22 10:04

    To speed things up, I would take advantage of the fact that you are not asked to find an arbitrary image/object, but specifically one with the Coca-Cola logo. This is significant because this logo is very distinctive, and it should have a characteristic, scale-invariant signature in the frequency domain, particularly in the red channel of RGB. That is to say, the alternating pattern of red-to-white-to-red encountered by a horizontal scan line (trained on a horizontally aligned logo) will have a distinctive "rhythm" as it passes through the central axis of the logo. That rhythm will "speed up" or "slow down" at different scales and orientations, but will remain proportionally equivalent. You could identify/define a few dozen such scanlines, both horizontally and vertically through the logo and several more diagonally, in a starburst pattern. Call these the "signature scan lines."

    Signature scan line

    Searching for this signature in the target image is a simple matter of scanning the image in horizontal strips. Look for a high-frequency in the red-channel (indicating moving from a red region to a white one), and once found, see if it is followed by one of the frequency rhythms identified in the training session. Once a match is found, you will instantly know the scan-line's orientation and location in the logo (if you keep track of those things during training), so identifying the boundaries of the logo from there is trivial.

    I would be surprised if this weren't a linearly-efficient algorithm, or nearly so. It obviously doesn't address your can-bottle discrimination, but at least you'll have your logos.

    (Update: for bottle recognition I would look for coke (the brown liquid) adjacent to the logo -- that is, inside the bottle. Or, in the case of an empty bottle, I would look for a cap which will always have the same basic shape, size, and distance from the logo and will typically be all white or red. Search for a solid color eliptical shape where a cap should be, relative to the logo. Not foolproof of course, but your goal here should be to find the easy ones fast.)

    (It's been a few years since my image processing days, so I kept this suggestion high-level and conceptual. I think it might slightly approximate how a human eye might operate -- or at least how my brain does!)

    0 讨论(0)
  • 2020-11-22 10:05

    I'm not aware of OpenCV but looking at the problem logically I think you could differentiate between bottle and can by changing the image which you are looking for i.e. Coca Cola. You should incorporate till top portion of can as in case of can there is silver lining at top of coca cola and in case of bottle there will be no such silver lining.

    But obviously this algorithm will fail in cases where top of can is hidden, but in such case even human will not be able to differentiate between the two (if only coca cola portion of bottle/can is visible)

    0 讨论(0)
  • 2020-11-22 10:06

    I really like Darren Cook's and stacker's answers to this problem. I was in the midst of throwing my thoughts into a comment on those, but I believe my approach is too answer-shaped to not leave here.

    In short summary, you've identified an algorithm to determine that a Coca-Cola logo is present at a particular location in space. You're now trying to determine, for arbitrary orientations and arbitrary scaling factors, a heuristic suitable for distinguishing Coca-Cola cans from other objects, inclusive of: bottles, billboards, advertisements, and Coca-Cola paraphernalia all associated with this iconic logo. You didn't call out many of these additional cases in your problem statement, but I feel they're vital to the success of your algorithm.

    The secret here is determining what visual features a can contains or, through the negative space, what features are present for other Coke products that are not present for cans. To that end, the current top answer sketches out a basic approach for selecting "can" if and only if "bottle" is not identified, either by the presence of a bottle cap, liquid, or other similar visual heuristics.

    The problem is this breaks down. A bottle could, for example, be empty and lack the presence of a cap, leading to a false positive. Or, it could be a partial bottle with additional features mangled, leading again to false detection. Needless to say, this isn't elegant, nor is it effective for our purposes.

    To this end, the most correct selection criteria for cans appear to be the following:

    • Is the shape of the object silhouette, as you sketched out in your question, correct? If so, +1.
    • If we assume the presence of natural or artificial light, do we detect a chrome outline to the bottle that signifies whether this is made of aluminum? If so, +1.
    • Do we determine that the specular properties of the object are correct, relative to our light sources (illustrative video link on light source detection)? If so, +1.
    • Can we determine any other properties about the object that identify it as a can, including, but not limited to, the topological image skew of the logo, the orientation of the object, the juxtaposition of the object (for example, on a planar surface like a table or in the context of other cans), and the presence of a pull tab? If so, for each, +1.

    Your classification might then look like the following:

    • For each candidate match, if the presence of a Coca Cola logo was detected, draw a gray border.
    • For each match over +2, draw a red border.

    This visually highlights to the user what was detected, emphasizing weak positives that may, correctly, be detected as mangled cans.

    The detection of each property carries a very different time and space complexity, and for each approach, a quick pass through http://dsp.stackexchange.com is more than reasonable for determining the most correct and most efficient algorithm for your purposes. My intent here is, purely and simply, to emphasize that detecting if something is a can by invalidating a small portion of the candidate detection space isn't the most robust or effective solution to this problem, and ideally, you should take the appropriate actions accordingly.

    And hey, congrats on the Hacker News posting! On the whole, this is a pretty terrific question worthy of the publicity it received. :)

    0 讨论(0)
  • 2020-11-22 10:10

    If you are not limited to just a camera which wasn't in one of your constraints perhaps you can move to using a range sensor like the Xbox Kinect. With this you can perform depth and colour based matched segmentation of the image. This allows for faster separation of objects in the image. You can then use ICP matching or similar techniques to even match the shape of the can rather then just its outline or colour and given that it is cylindrical this may be a valid option for any orientation if you have a previous 3D scan of the target. These techniques are often quite quick especially when used for such a specific purpose which should solve your speed problem.

    Also I could suggest, not necessarily for accuracy or speed but for fun you could use a trained neural network on your hue segmented image to identify the shape of the can. These are very fast and can often be up to 80/90% accurate. Training would be a little bit of a long process though as you would have to manually identify the can in each image.

    0 讨论(0)
  • 2020-11-22 10:12

    You need a program that learns and improves classification accuracy organically from experience.

    I'll suggest deep learning, with deep learning this becomes a trivial problem.

    You can retrain the inception v3 model on Tensorflow:

    How to Retrain Inception's Final Layer for New Categories.

    In this case, you will be training a convolutional neural network to classify an object as either a coca-cola can or not.

    0 讨论(0)
  • 2020-11-22 10:14

    An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

    You can find a nice OpenCV code example in Java, C++, and Python on this page: Features2D + Homography to find a known object

    Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

    Enter image description here

    Image source: tutorial example

    The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.

    The original papers

    • SURF: Speeded Up Robust Features
    • Distinctive Image Features from Scale-Invariant Keypoints
    • ORB: an efficient alternative to SIFT or SURF
    0 讨论(0)
提交回复
热议问题