Indices of a substring in Smalltalk

我与影子孤独终老i 提交于 2019-12-10 14:43:21

问题


It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants.

There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions.

My goal is to find a package or method which provides the following behavior:

'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"

When overlapping is considered:

'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"

When overlapping is not considered:

'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"

In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. However I couldn't find a String implementation of this.

My best effort so far results in this implementation in String (Pharo 6):

indicesOfSubstring: subString
  | indices i |

  indices := OrderedCollection new: self size.
  i := 0.
  [ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
    indices addLast: i ].
  ^ indices

回答1:


Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. Therefore your examples should read

'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"

Note that I've also taken notice of @lurker's observation (and have tweaked the selector too).

Now, starting from your code I would change it as follows:

indexesOfSubstring: subString overlapping: aBoolean
  | n indexes i |
  n := subString size.
  indexes := OrderedCollection new.                            "removed the size"
  i := 1.                                                      "1-based"
  [
    i := self findString: subString startingAt: i.             "split condition"
    i > 0]
  whileTrue: [
    indexes add: i.                                            "add: = addLast:"
    i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]].           "new!"
  ^indexes

Make sure you write some few unit tests (and don't forget to exercise the border cases!)




回答2:


Edited

It would also be nice if you would tell us what you need to achieve in the "greater picture". Sometimes Smalltalk offers different approaches.

Leandro beat me to the the code (and his code is more efficient), but I have already written it so I'll share it too. Heed his advice on Smalltalk being 1-based => rewritten example.

I have used Smalltalk/X and Pharo 6.1 for the example.

The code would be:

indexesOfSubstring: substringToFind overlapping: aBoolean

    | substringPositions aPosition currentPosition |

    substringPositions := OrderedSet new. "with overlap on you could get multiple same 
              positions in the result when there is more to find in the source string"

    substringToFindSize := substringToFind size. "speed up for large strings"
    aPosition := 1.

    [ self size > aPosition ] whileTrue: [
        currentPosition := self findString: substringToFind startingAt: aPosition.
        (currentPosition = 0) ifTrue: [ aPosition := self size + 1 ] "ends the loop substringToFind is not found"
                             ifFalse: [
                                 substringPositions add: currentPosition.
                                 aBoolean ifTrue: [ aPosition := aPosition + 1 ] "overlapping is on"
                                         ifFalse: [ aPosition := currentPosition + substringToFindSize ] "overlapping is off"
                             ]
    ].

    ^ substringPositions

I have fixed some issues that occured to me. Don't forget to test it as much as you can!



来源:https://stackoverflow.com/questions/51178749/indices-of-a-substring-in-smalltalk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!