hello guys i was given homework problem where it asks me to find all distinct substring of a string. I have implemented a method which will tell you all the substrings of s
Insert any new sub string into an array and check if it is already available available there don't add it to the array else do. When done loop through the array and print out the distinct sub strings.
To check if an element exists in an array create a function that takes an array and a value as parameters. It would loop through the array looking for the value if found return true. Out of the loop return false.
e.g.
public static boolean(String target, String[] arr)
{
for(int i = 0; i < arr.length; i++){
if(arr[i].equals(target))
return true;
}
return false;
}
Here the example with a Set
public int printSubstrings1(int length) {
Set<String> set = new HashSet<String>();
for(int i=0; i < text.length() - length + 1; i++) {
String sub = text.substring(i,length+i);
set.add(sub);
}
for (String str : set) {
System.out.println(str);
}
return set.size();
}
There are two ways you could do this, not sure if your teacher permits but I am going to use a HashSet for uniqueness.
Without using 'substring()':
void uniqueSubStrings(String test) {
HashSet < String > substrings = new LinkedHashSet();
char[] a = test.toCharArray();
for (int i = 0; i < test.length(); i++) {
substrings.add(a[i] + "");
for (int j = i + 1; j < test.length(); j++) {
StringBuilder sb = new StringBuilder();
for (int k = i; k <= j; k++) {
sb.append(a[k]);
}
substrings.add(sb.toString());
}
}
System.out.println(substrings);
}
Using 'substring':
void uniqueSubStringsWithBuiltIn(String test) {
HashSet<String> substrings = new LinkedHashSet();
for(int i=0; i<test.length();i++) {
for(int j=i+1;j<test.length()+1;j++) {
substrings.add(test.substring(i, j));
}
}
System.out.println(substrings);}
public ArrayList<String> getAllUniqueSubset(String str) {
ArrayList<String> set = new ArrayList<String>();
for (int i = 0; i < str.length(); i++) {
for (int j = 0; j < str.length() - i; j++) {
String elem = str.substring(j, j + (i+1));
if (!set.contains(elem)) {
set.add(elem);
}
}
}
return set;
}
This algorithm uses just the Z-function / Z algorithm.
For each prefix i of the word, reverse it and do z_function over it.
The number of new distinct substrings that end in i
is (the length of the prefix) — (maximum value in the z_function array)
.
The pseudo code looks like this:
string s; cin >> s;
int sol = 0
foreach i to s.size()-1
string x = s.substr( 0 , i+1 );
reverse( x.begin() , x.end() );
vector<int> z = z_function( x );
//this works too
//vector<int> z = prefix_functionx(x);
int mx = 0;
foreach j to x.size()-1
mx = max( mx , z[j] );
sol += (i+1) - mx;
cout << sol;
The time complexity of this algorithm is O(n^2). The maximum can be returned from the z_function as well.
Source.
This is not my original answer. I am merely linking to it and pasting it in case the link goes down.
I followed this link.Acknowledged the content from similar answer in quora
The solution consists of constructing the suffix array and then finding the number of distinct substrings based on the Longest Common Prefixes.
One key observation here is that:
If you look through the prefixes of each suffix of a string, you have covered all substrings of that string.
Let us take an example: BANANA
Suffixes are: 0) BANANA 1) ANANA 2) NANA 3) ANA 4) NA 5) A
It would be a lot easier to go through the prefixes if we sort the above set of suffixes, as we can skip the repeated prefixes easily.
Sorted set of suffixes: 5) A 3) ANA 1) ANANA 0) BANANA 4) NA 2) NANA
From now on,
LCP = Longest Common Prefix of 2 strings.
Initialize
ans = length(first suffix) = length("A") = 1.
Now consider the consecutive pairs of suffixes, i.e, [A, ANA], [ANA, ANANA], [ANANA, BANANA], etc. from the above set of sorted suffixes.
We can see that, LCP("A", "ANA") = "A".
All characters that are not part of the common prefix contribute to a distinct substring. In the above case, they are 'N' and 'A'. So they should be added to ans.
So we have, 1 2 ans += length("ANA") - LCP("A", "ANA") ans = ans + 3 - 1 = ans + 2 = 3
Do the same for the next pair of consecutive suffixes: ["ANA", "ANANA"]
1 2 3 4 LCP("ANA", "ANANA") = "ANA". ans += length("ANANA") - length(LCP) => ans = ans + 5 - 3 => ans = 3 + 2 = 5.
Similarly, we have:
1 2 LCP("ANANA", "BANANA") = 0 ans = ans + length("BANANA") - 0 = 11
1 2 LCP("BANANA", "NA") = 0 ans = ans + length("NA") - 0 = 13
1 2 LCP("NA", "NANA") = 2 ans = ans + length("NANA") - 2 = 15
Hence the number of distinct substrings for the string "BANANA" = 15.