Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

前端 未结 23 1248
后悔当初
后悔当初 2020-11-22 09:26

One of the most interesting projects I\'ve worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recogni

相关标签:
23条回答
  • 2020-11-22 10:14

    There are a bunch of color descriptors used to recognise objects, the paper below compares a lot of them. They are specially powerful when combined with SIFT or SURF. SURF or SIFT alone are not very useful in a coca cola can image because they don't recognise a lot of interest points, you need the color information to help. I use BIC (Border/Interior Pixel Classification) with SURF in a project and it worked great to recognise objects.

    Color descriptors for Web image retrieval: a comparative study

    0 讨论(0)
  • 2020-11-22 10:16

    The first things I would look for are color - like RED , when doing Red eye detection in an image - there is a certain color range to detect , some characteristics about it considering the surrounding area and such as distance apart from the other eye if it is indeed visible in the image.

    1: First characteristic is color and Red is very dominant. After detecting the Coca Cola Red there are several items of interest 1A: How big is this red area (is it of sufficient quantity to make a determination of a true can or not - 10 pixels is probably not enough), 1B: Does it contain the color of the Label - "Coca-Cola" or wave. 1B1: Is there enough to consider a high probability that it is a label.

    Item 1 is kind of a short cut - pre-process if that doe snot exist in the image - move on.

    So if that is the case I can then utilize that segment of my image and start looking more zoom out of the area in question a little bit - basically look at the surrounding region / edges...

    2: Given the above image area ID'd in 1 - verify the surrounding points [edges] of the item in question. A: Is there what appears to be a can top or bottom - silver? B: A bottle might appear transparent , but so might a glass table - so is there a glass table/shelf or a transparent area - if so there are multiple possible out comes. A Bottle MIGHT have a red cap, it might not, but it should have either the shape of the bottle top / thread screws, or a cap. C: Even if this fails A and B it still can be a can - partial.. This is more complex when it is partial because a partial bottle / partial can might look the same , so some more processing of measurement of the Red region edge to edge.. small bottle might be similar in size ..

    3: After the above analysis that is when I would look at the lettering and the wave logo - because I can orient my search for some of the letters in the words As you might not have all of the text due to not having all of the can, the wave would align at certain points to the text (distance wise) so I could search for that probability and know which letters should exist at that point of the wave at distance x.

    0 讨论(0)
  • 2020-11-22 10:18

    Hmm, I actually think I'm onto something (this is like the most interesting question ever - so it'd be a shame not to continue trying to find the "perfect" answer, even though an acceptable one has been found)...

    Once you find the logo, your troubles are half done. Then you only have to figure out the differences between what's around the logo. Additionally, we want to do as little extra as possible. I think this is actually this easy part...

    What is around the logo? For a can, we can see metal, which despite the effects of lighting, does not change whatsoever in its basic colour. As long as we know the angle of the label, we can tell what's directly above it, so we're looking at the difference between these:

    Here, what's above and below the logo is completely dark, consistent in colour. Relatively easy in that respect.

    Here, what's above and below is light, but still consistent in colour. It's all-silver, and all-silver metal actually seems pretty rare, as well as silver colours in general. Additionally, it's in a thin slither and close enough to the red that has already been identified so you could trace its shape for its entire length to calculate a percentage of what can be considered the metal ring of the can. Really, you only need a small fraction of that anywhere along the can to tell it is part of it, but you still need to find a balance that ensures it's not just an empty bottle with something metal behind it.

    And finally, the tricky one. But not so tricky, once we're only going by what we can see directly above (and below) the red wrapper. Its transparent, which means it will show whatever is behind it. That's good, because things that are behind it aren't likely to be as consistent in colour as the silver circular metal of the can. There could be many different things behind it, which would tell us that it's an empty (or filled with clear liquid) bottle, or a consistent colour, which could either mean that it's filled with liquid or that the bottle is simply in front of a solid colour. We're working with what's closest to the top and bottom, and the chances of the right colours being in the right place are relatively slim. We know it's a bottle, because it hasn't got that key visual element of the can, which is relatively simplistic compared to what could be behind a bottle.

    (that last one was the best I could find of an empty large coca cola bottle - interestingly the cap AND ring are yellow, indicating that the redness of the cap probably shouldn't be relied upon)

    In the rare circumstance that a similar shade of silver is behind the bottle, even after the abstraction of the plastic, or the bottle is somehow filled with the same shade of silver liquid, we can fall back on what we can roughly estimate as being the shape of the silver - which as I mentioned, is circular and follows the shape of the can. But even though I lack any certain knowledge in image processing, that sounds slow. Better yet, why not deduce this by for once checking around the sides of the logo to ensure there is nothing of the same silver colour there? Ah, but what if there's the same shade of silver behind a can? Then, we do indeed have to pay more attention to shapes, looking at the top and bottom of the can again.

    Depending on how flawless this all needs to be, it could be very slow, but I guess my basic concept is to check the easiest and closest things first. Go by colour differences around the already matched shape (which seems the most trivial part of this anyway) before going to the effort of working out the shape of the other elements. To list it, it goes:

    • Find the main attraction (red logo background, and possibly the logo itself for orientation, though in case the can is turned away, you need to concentrate on the red alone)
    • Verify the shape and orientation, yet again via the very distinctive redness
    • Check colours around the shape (since it's quick and painless)
    • Finally, if needed, verify the shape of those colours around the main attraction for the right roundness.

    In the event you can't do this, it probably means the top and bottom of the can are covered, and the only possible things that a human could have used to reliably make a distinction between the can and the bottle is the occlusion and reflection of the can, which would be a much harder battle to process. However, to go even further, you could follow the angle of the can/bottle to check for more bottle-like traits, using the semi-transparent scanning techniques mentioned in the other answers.

    Interesting additional nightmares might include a can conveniently sitting behind the bottle at such a distance that the metal of it just so happens to show above and below the label, which would still fail as long as you're scanning along the entire length of the red label - which is actually more of a problem because you're not detecting a can where you could have, as opposed to considering that you're actually detecting a bottle, including the can by accident. The glass is half empty, in that case!


    As a disclaimer, I have no experience in nor have ever thought about image processing outside of this question, but it is so interesting that it got me thinking pretty deeply about it, and after reading all the other answers, I consider this to possibly be the easiest and most efficient way to get it done. Personally, I'm just glad I don't actually have to think about programming this!

    EDIT

    bad drawing of a can in MS paint Additionally, look at this drawing I did in MS Paint... It's absolutely awful and quite incomplete, but based on the shape and colours alone, you can guess what it's probably going to be. In essence, these are the only things that one needs to bother scanning for. When you look at that very distinctive shape and combination of colours so close, what else could it possibly be? The bit I didn't paint, the white background, should be considered "anything inconsistent". If it had a transparent background, it could go over almost any other image and you could still see it.

    0 讨论(0)
  • 2020-11-22 10:18

    As alternative to all these nice solutions, you can train your own classifier and make your application robust to errors. As example, you can use Haar Training, providing a good number of positive and negative images of your target.

    It can be useful to extract only cans and can be combined with the detection of transparent objects.

    0 讨论(0)
  • 2020-11-22 10:19

    I like your question, regardless of whether it's off topic or not :P

    An interesting aside; I've just completed a subject in my degree where we covered robotics and computer vision. Our project for the semester was incredibly similar to the one you describe.

    We had to develop a robot that used an Xbox Kinect to detect coke bottles and cans on any orientation in a variety of lighting and environmental conditions. Our solution involved using a band pass filter on the Hue channel in combination with the hough circle transform. We were able to constrain the environment a bit (we could chose where and how to position the robot and Kinect sensor), otherwise we were going to use the SIFT or SURF transforms.

    You can read about our approach on my blog post on the topic :)

    0 讨论(0)
提交回复
热议问题