I\'m trying to create a data structure that holds all the possible substring combinations that add up to the original string. For example, if the string is \"java\"
It seems like this is the problem of finding the compositions of the length of the string, and using those compositions to make substrings. So there are 2^n-1 compositions of a number n, which could make it a bit time-consuming for long strings...
Here's one approach:
static List<List<String>> substrings(String input) {
// Base case: There's only one way to split up a single character
// string, and that is ["x"] where x is the character.
if (input.length() == 1)
return Collections.singletonList(Collections.singletonList(input));
// To hold the result
List<List<String>> result = new ArrayList<>();
// Recurse (since you tagged the question with recursion ;)
for (List<String> subresult : substrings(input.substring(1))) {
// Case: Don't split
List<String> l2 = new ArrayList<>(subresult);
l2.set(0, input.charAt(0) + l2.get(0));
result.add(l2);
// Case: Split
List<String> l = new ArrayList<>(subresult);
l.add(0, input.substring(0, 1));
result.add(l);
}
return result;
}
Output:
[java]
[j, ava]
[ja, va]
[j, a, va]
[jav, a]
[j, av, a]
[ja, v, a]
[j, a, v, a]
A different recursive solution that just adds to the list results
static List<List<String>> substrings(String input) {
List<List<String>> result = new ArrayList<>();
if (input.length() == 1) {
result.add(Arrays.asList(new String[]{input}));
}
else {
//iterate j, ja, jav, jav
for (int i = 0; i < input.length()-1; i++ ) {
String root = input.substring(0,i+1);
String leaf = input.substring(i+1);
for( List<String> strings: substrings(leaf) ) {
ArrayList<String> current = new ArrayList<String>();
current.add(root);
current.addAll(strings);
result.add(current);
}
}
//adds the whole string as one of the leaves (ie. java, ava, va, a)
result.add(Arrays.asList(new String[]{input}));
}
return result;
}
Probably somebody would like another solution which is non-recursive and takes no memory to hold a list:
public static List<List<String>> substrings(final String input) {
if(input.isEmpty())
return Collections.emptyList();
final int size = 1 << (input.length()-1);
return new AbstractList<List<String>>() {
@Override
public List<String> get(int index) {
List<String> entry = new ArrayList<>();
int last = 0;
while(true) {
int next = Integer.numberOfTrailingZeros(index >> last)+last+1;
if(next == last+33)
break;
entry.add(input.substring(last, next));
last = next;
}
entry.add(input.substring(last));
return entry;
}
@Override
public int size() {
return size;
}
};
}
public static void main(String[] args) {
System.out.println(substrings("java"));
}
Output:
[[java], [j, ava], [ja, va], [j, a, va], [jav, a], [j, av, a], [ja, v, a], [j, a, v, a]]
It just calculates the next combination based on its index.
Just in case someone will look for the same algorithm in python, here is implementation in Python:
from itertools import combinations
def compositions(s):
n = len(s)
for k in range(n):
for c in combinations(range(1, n), k):
yield tuple(s[i:j] for i, j in zip((0,) + c, c + (n,)))
Example how it works:
>>> for x in compositions('abcd'):
... print(x)
('abcd',)
('a', 'bcd')
('ab', 'cd')
('abc', 'd')
('a', 'b', 'cd')
('a', 'bc', 'd')
('ab', 'c', 'd')
('a', 'b', 'c', 'd')
With a small modification you can generate compositions in different order:
def compositions(s):
n = len(s)
for k in range(n):
for c in itertools.combinations(range(n - 1, 0, -1), k):
yield tuple(s[i:j] for i, j in zip((0,) + c[::-1], c[::-1] + (n,)))
It will give you this:
>>> for x in compositions('abcd'):
... print(x)
('abcd',)
('abc', 'd')
('ab', 'cd')
('a', 'bcd')
('ab', 'c', 'd')
('a', 'bc', 'd')
('a', 'b', 'cd')
('a', 'b', 'c', 'd')
And with another small addition, you can generate only specified number of splits:
def compositions(s, r=None):
n = len(s)
r = range(n) if r is None else [r - 1]
for k in r:
for c in itertools.combinations(range(n - 1, 0, -1), k):
yield tuple(s[i:j] for i, j in zip((0,) + c[::-1], c[::-1] + (n,)))
>>> for x in compositions('abcd', 3):
... print(x)
('ab', 'c', 'd')
('a', 'bc', 'd')
('a', 'b', 'cd')