finding all distinct substring of a string

前端未结

关注

 6  1713

hello guys i was given homework problem where it asks me to find all distinct substring of a string. I have implemented a method which will tell you all the substrings of s

相关标签:

6条回答

滥情空心

2021-01-14 21:52
Insert any new sub string into an array and check if it is already available available there don't add it to the array else do. When done loop through the array and print out the distinct sub strings.

To check if an element exists in an array create a function that takes an array and a value as parameters. It would loop through the array looking for the value if found return true. Out of the loop return false.

e.g.
```
public static boolean(String target, String[] arr)
{
  for(int i = 0; i < arr.length; i++){
      if(arr[i].equals(target))
         return true;
  }
   return false;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

太阳男子

2021-01-14 21:57

Here the example with a Set

public int printSubstrings1(int length) {
    Set<String> set = new HashSet<String>();
    for(int i=0; i < text.length() - length + 1; i++) {
        String sub = text.substring(i,length+i);
        set.add(sub);
    }
    for (String str : set) {
        System.out.println(str);
    }
    return set.size();
}

0 讨论(0)

情书的邮戳

2021-01-14 22:00

There are two ways you could do this, not sure if your teacher permits but I am going to use a HashSet for uniqueness.

Without using 'substring()':

void uniqueSubStrings(String test) {
HashSet < String > substrings = new LinkedHashSet();
char[] a = test.toCharArray();
for (int i = 0; i < test.length(); i++) {
    substrings.add(a[i] + "");
    for (int j = i + 1; j < test.length(); j++) {
        StringBuilder sb = new StringBuilder();
        for (int k = i; k <= j; k++) {
            sb.append(a[k]);
        }
        substrings.add(sb.toString());
    }
}
System.out.println(substrings);

}

Using 'substring':

    void uniqueSubStringsWithBuiltIn(String test) {
    HashSet<String> substrings = new LinkedHashSet();

    for(int i=0; i<test.length();i++) {
        for(int j=i+1;j<test.length()+1;j++) {
            substrings.add(test.substring(i, j));
        }
    }
        System.out.println(substrings);}

0 讨论(0)

不要未来只要你来

2021-01-14 22:02

public ArrayList<String> getAllUniqueSubset(String str) {
        ArrayList<String> set = new ArrayList<String>();
        for (int i = 0; i < str.length(); i++) {
            for (int j = 0; j < str.length() - i; j++) {
                String elem = str.substring(j, j + (i+1));
                if (!set.contains(elem)) {
                    set.add(elem);
                }
            }
        }
        return set;
    }

0 讨论(0)

日久生厌

2021-01-14 22:08
This algorithm uses just the Z-function / Z algorithm.

For each prefix i of the word, reverse it and do z_function over it. The number of new distinct substrings that end in i is (the length of the prefix) — (maximum value in the z_function array). The pseudo code looks like this:
```
string s; cin >> s;
int sol = 0
foreach i to s.size()-1
    string x = s.substr( 0 , i+1 );
    reverse( x.begin() , x.end() );
    vector<int> z = z_function( x );
    //this works too
    //vector<int> z = prefix_functionx(x); 
    int mx = 0;
    foreach j to x.size()-1
        mx = max( mx , z[j] );
    sol += (i+1) - mx; 

cout << sol;
```
The time complexity of this algorithm is O(n^2). The maximum can be returned from the z_function as well.

Source.

This is not my original answer. I am merely linking to it and pasting it in case the link goes down.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2021-01-14 22:19

I followed this link.Acknowledged the content from similar answer in quora

The solution consists of constructing the suffix array and then finding the number of distinct substrings based on the Longest Common Prefixes.

One key observation here is that:

If you look through the prefixes of each suffix of a string, you have covered all substrings of that string.

Let us take an example: BANANA

Suffixes are: 0) BANANA 1) ANANA 2) NANA 3) ANA 4) NA 5) A

It would be a lot easier to go through the prefixes if we sort the above set of suffixes, as we can skip the repeated prefixes easily.

Sorted set of suffixes: 5) A 3) ANA 1) ANANA 0) BANANA 4) NA 2) NANA

From now on,

LCP = Longest Common Prefix of 2 strings.

Initialize

ans = length(first suffix) = length("A") = 1.

Now consider the consecutive pairs of suffixes, i.e, [A, ANA], [ANA, ANANA], [ANANA, BANANA], etc. from the above set of sorted suffixes.

We can see that, LCP("A", "ANA") = "A".

All characters that are not part of the common prefix contribute to a distinct substring. In the above case, they are 'N' and 'A'. So they should be added to ans.

So we have, 1 2 ans += length("ANA") - LCP("A", "ANA") ans = ans + 3 - 1 = ans + 2 = 3

Do the same for the next pair of consecutive suffixes: ["ANA", "ANANA"]

1 2 3 4 LCP("ANA", "ANANA") = "ANA". ans += length("ANANA") - length(LCP) => ans = ans + 5 - 3 => ans = 3 + 2 = 5.

Similarly, we have:

1 2 LCP("ANANA", "BANANA") = 0 ans = ans + length("BANANA") - 0 = 11

1 2 LCP("BANANA", "NA") = 0 ans = ans + length("NA") - 0 = 13

1 2 LCP("NA", "NANA") = 2 ans = ans + length("NANA") - 2 = 15

Hence the number of distinct substrings for the string "BANANA" = 15.

0 讨论(0)
发布评论:

提交评论
- 加载中...