Locate an ASCII art image inside a body of text with a certain toleration for error

早过忘川 提交于 2019-12-20 21:21:13

问题


Are there any algorithms that would find the following ASCII-art image?

     +
     +
    +++
  +++++++
  ++   ++
 ++  +  ++
 ++ +++ ++
 ++  +  ++
  ++   ++
  +++++++
    +++

Inside the following body of text?

complete_file_here

              + +    +              ++           +       +++    +     +
 +  ++     +   + ++++    + +       +         +          +  +   +++     +++ +
     +            + +   ++      ++  ++    + ++       +     +      +  +   +
+   ++      +  ++       +          + +       ++        ++  +           +
 ++++++ + +    +   ++  +  +   +   +  ++      +         +                     +
  + +   +      +               +      ++     +  ++            +   +    + +
+++   + ++   +  +            +  +++       + +       ++                     +
  +++++  +      +                            +  + +            +   +  +
 +   +   +              +    +      +            +  +   +      +    +     +
 ++    +              +     +       ++   +          +       +           ++

I must highlight the ASCII art image in yellow which corresponds to the complete shape. See the picture attached:

I have to search a file which contains the rough shape, but not completely, a number of + can be missing). The tolerance for a missing + in the shape should be set by hand.

Now, I have two 2D arrays data array: [100][100] and the SlimeTorpedo array: [13][11].

Code for how to make the detection as stated by @kjartan (3-4 bullet):

     int match = 0;
     for (int i = 0; i < 100; i++) {
         for (int j = 0; j < 100; j++) {
             //Compare DataArr[i][j] with SlimeTorpedoArr[i][j]
             //Look for "checked" position in the picture ("+"), 
             //which corresponds to a checked position in the 
             //slime torpedo array.
             //match++;
        }
    }

What would be general guidance for how to approach this problem?


回答1:


Try brute-force with a match-score:

  • Define a "square" around your "slime torpedo"; that is a 2D array with width and height a little wider and higher than your torpedo.
  • In that 2D array, mark cells as either checked or unchecked, as necessary to create the image you want.
  • Now loop through each character (let's call this an "index" position) in your full image, and for each, compare the positions near it with that of the corresponding character in the 2D array.
  • Look for "checked" (or unchecked) position in the picture, which corresponds to a checked (or unchecked) position in the slime torpedo array (for instance, a char X above and Y left of the current index position in the picture, that matches the status X above and Y left of the center point of the slime torpedo array). For each such "match", add one "point" to that index position in the picture.

Now here's the trick: To make this a lot more effective, only check some of the positions in the slime-torpedo - for instance, every 10'th position or even less. That should reduce the running time by a factor of 10, roughly speaking.

That would mean you would have to check (1/10) * the number of characters in the 2D array for each character in the whole picture.

Now keep track of the highest scoring positions in the full picture. The highest scoring position should be the best match.

If you want, you could run this multiple times, with a varying degree of detail, for instance checking only 1/20 of the positions the first time, then 1/2, next, but this time focusing only on for instance the highest 20 (or 50? 100?) scoring positions from the first round.

(Alternatively, you could do a more detailed scan of all positions scoring higher than some threshold S).

Hope you'll let us know how it goes whatever you decide, interesting question! :)

Update in response to the comments below:

Maybe my explanation was a little unclear. In short / pseudo-code, you would need to do something like this to find the score for each cell:

foreach(DataArraRow dataRow in dataArray){
    foreach(IndexCell index in dataRow){        

        // initialy, no score for this cell in the data array:
        indexCell.score = 0;

        // Now iterate through all SlimeTorpedo cells, and compare the 
        // symbol in it to the corresponding symbol in te data array:
        foreach(SlimeArrayRow slimeRow in slimeTorpedoArray){
            foreach(SlimeTorpedoCell slimeCell in slimeRow){
                if(IsMatchingSymbol(slimeCell.xPosition, 
                                    slimeCell.yPosition, 
                                    slimeCell.symbol, 
                                    indexCell){
                    indexCell.score += 1;
                }else{
                    indexCell.score -= 1;
                }
            }              
        }

    }
}


Function IsMatchingSymbol(x, y, slimeSymbol, indexCell){
   // Find the cell in the data array corresponding to the 
   // "slimeCell" currently being checked:
   var cellToCheck = getCell(indexCell.xPosition + x, 
                             indexCell.yPosition + y);

   if(cellToCheck.symbol == slimeSymbol){
       return true;
   }else{
       return false;
   }

}

This is obviously a little messy, and I'm not sure about all the details, but I hope it shows a general idea which should work. When you're done iterating through this, iterate over all cells again, and pick up the highest scoring cells (or build a separate high-score list along the way - that would probably be quicker).

You will have to do a few changes, such as replacing the ForEach loops with a regular For(int i=0; i < someArrayLength; i = i + levelOfDetail){ ... } or something similar, where levelOfDetail is an integer with which you adjust the level of detail (i.e. how many cells in the SlimeTorpedoArray to check). I'm sure you can work it out... ;)




回答2:


Let's say width and height parameters (in terms of number of characters) are known for your first shape. Let them be width and height.

  • Encode your input into a 2D array of bits (or + signs). So you have int[][] inputBits = new int[height][width]; and you should populate it correctly. (It's your task, dude.)
  • Then apply a simple search on the larger shape (assuming it is encoded into another 2D array as well). Shift the pivot area to right by one each time (the pivot area is equivalent to the area of first shape) and check whether pivot area (2D array) has all its elements equal to the first shape. That's a brute-force algorithm =)



回答3:


For those interested, I solved this problem using XOR mapping in Java:

https://bitbucket.org/bluegod1/blifoscope-java/

It also takes into account that there might be false positives or duplicates, it has an option for specifying the minimum threshold for a good match, add a custom data image file, etc...



来源:https://stackoverflow.com/questions/14246120/locate-an-ascii-art-image-inside-a-body-of-text-with-a-certain-toleration-for-er

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!