Imagine an alphabet of words.
Example:
a ==> 1
b ==> 2
c ==> 3
z ==> 26
ab ==> 27
ac ==> 28
az ==> 51
bc ==> 52
This is a combination of two problems: parsing a number in a base that isn't 10 and determining if input is sorted.
Note that, since this is probably homework, you probably can't just use existing methods to do the hard work.
Now for the magic formula:
Coded with examples and demonstration of speed even in worse case scenario:
def comb(n,k): #returns combinations
p = 1 #product
for i in range(k):
p *= (n-i)/(i+1)
return p
def solve(string):
x = []
for letter in string:
x.append(ord(letter)-96) #convert string to list of integers
x = list(reversed(x)) #reverse the order of string
#Next, the magic formula
return x[0]+sum(comb(26,i)-comb(26-x[i-1]+1,i)*(1-i/(26-x[i-1]+1)) for i in range(2,len(x)+1))
solve('bhp')
764.0
>>> solve('afkp')
3996.0
>>> solve('abcdefghijklmnopqrstuvwxyz')
67108863.0
>>> solve('hpz')
2090.0
>>> solve('aez')
441.0
>>> if 1:
s = ''
for a in range(97,97+26):
s += chr(a)
t = time.time()
for v in range(1000):
temp = solve(s)
print (time.time()-t)
0.1650087833404541
In order to understand my explanation to this formula, I need to go over a mathematical occurrence in pascal's triangle and the binomial theorem:
Here is pascal's triangle:
Going from top right to bottom left, first there is a sequence of 1s. Then a sequence of the counting numbers. The next sequence is the sum of the counting numbers. These are known as the triangular numbers. The next sequence is the sum of the triangular numbers, known as the tetrahedral numbers and this pattern goes on and on.
Now for the binomial theorem:
By combining the binomial theorem and pascals triangle, it can be seen that the nth triangular number is:
and nth tetrahedral number is:
the sum of the first n tetrahedral numbers is:
and on ...
Now for the explanation. For this explanation, I will only use 6 letters, a-f, and will replace these with the numbers 1-6. The procedure is the same with more letters
If the length is 1, then the possible sequences are:
1
2
3
4
5
6
In this the answer is simply the value
Now for a length of 2:
12 13 14 15 16
23 24 25 26
34 35 36
45 46
56
To solve this we split it into 3 parts:
Next we will repeat for sequences of length 3:
123 124 125 126
134 135 136
145 146
156
234 235 236
245 246
256
345 346
356
456
Once again we split this problem into steps:
This
234 235 236
245 246
256
becomes
12 13 14
23 24
34
Combining these our total formula for length 3 becomes:
We can follow this pattern of reduction for higher length sequences
Now we will right out our formulas to look for patterns:
Length 1: y1
Length 2:
Length 3:
Note: I also used length 4 to make sure the patterns held
With a bit of math, grouping of terms, and the change from 6 to 26 our formula becomes:
In order to simplify this further, more math must be done.
This identity holds true for all a and b. For a quick fun exercise, prove it (not really difficult):
This identity allows as to further group and negate terms to reach our much oversimplified formula:
Base 26 numbering system. I would suggest you look at octal, decimal and hexadecimal numbering systems once you understand how to figure out convert any of them to decimal you will know this one too.
This really ought to be a comment, but I can't put code in a comment.
I wrote a brute force program to calculate the number of one, two, three, four, and five letter words, based on the criteria the original poster provided.
Imagine an alphabet of words such that the sequence of characters in a word has to be in ascending order only.
Here are the results of my program.
One letter words - 26
Two letter words - 325
Three letter words - 2600
Four letter words - 14950
Five letter words - 65780
Total words - 83681
My "solution" would be to generate a dictionary of all the words from a to abcdefghijklmnopqrstuvwxyz.
Here's the code I used. Maybe someone can look at the nested loops and come up with a formula. I can't.
public class WordSequence implements Runnable {
private int wordCount = 0;
@Override
public void run() {
int count = createOneLetterWords();
System.out.println("One letter words - " + count);
count = createTwoLetterWords();
System.out.println("Two letter words - " + count);
count = createThreeLetterWords();
System.out.println("Three letter words - " + count);
count = createFourLetterWords();
System.out.println("Four letter words - " + count);
count = createFiveLetterWords();
System.out.println("Five letter words - " + count);
System.out.println("\nTotal words - " + wordCount);
}
private int createOneLetterWords() {
int count = 0;
for (int i = 0; i < 26; i++) {
createWord(i);
wordCount++;
count++;
}
return count;
}
private int createTwoLetterWords() {
int count = 0;
for (int i = 0; i < 25; i++) {
for (int j = i + 1; j < 26; j++) {
createWord(i, j);
wordCount++;
count++;
}
}
return count;
}
private int createThreeLetterWords() {
int count = 0;
for (int i = 0; i < 24; i++) {
for (int j = i + 1; j < 25; j++) {
for (int k = j + 1; k < 26; k++) {
createWord(i, j, k);
wordCount++;
count++;
}
}
}
return count;
}
private int createFourLetterWords() {
int count = 0;
for (int i = 0; i < 23; i++) {
for (int j = i + 1; j < 24; j++) {
for (int k = j + 1; k < 25; k++) {
for (int m = k + 1; m < 26; m++) {
createWord(i, j, k, m);
wordCount++;
count++;
}
}
}
}
return count;
}
private int createFiveLetterWords() {
int count = 0;
for (int i = 0; i < 22; i++) {
for (int j = i + 1; j < 23; j++) {
for (int k = j + 1; k < 24; k++) {
for (int m = k + 1; m < 25; m++) {
for (int n = m + 1; n < 26; n++) {
createWord(i, j, k, m, n);
wordCount++;
count++;
}
}
}
}
}
return count;
}
private String createWord(int... letter) {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < letter.length; i++) {
builder.append((char) (letter[i] + 'a'));
}
return builder.toString();
}
public static void main(String[] args) {
new WordSequence().run();
}
}
For letters of this imaginary alphabet that are more than one character long, we may use the recursion:
XnXn-1..X1 =
max(n-1)
+ (max(n-1) - last (n-1)-character letter before
the first (n-1)-character letter after a)
... + (max(n-1) - last (n-1)-character letter before the
first (n-1)-character letter after the-letter-before-Xn)
+ 1 + ((Xn-1..X1) - first (n-1)-character letter after Xn)
where max(1) = z, max(2) = yz...
Haskell code:
import Data.List (sort)
import qualified Data.MemoCombinators as M
firstAfter letter numChars = take numChars $ tail [letter..]
lastBefore letter numChars = [toEnum (fromEnum letter - 1) :: Char]
++ reverse (take (numChars - 1) ['z','y'..])
max' numChars = reverse (take numChars ['z','y'..])
loop letter numChars =
foldr (\a b -> b
+ index (max' numChars)
- index (lastBefore (head $ firstAfter a numChars) numChars)
) 0 ['a'..letter]
index = M.list M.char index' where
index' letter
| null (drop 1 letter) = fromEnum (head letter) - 96
| letter /= sort letter = 0
| otherwise = index (max' (len - 1))
+ loop (head $ lastBefore xn 1) (len - 1)
+ 1
+ index (tail letter) - index (firstAfter xn (len - 1))
where len = length letter
xn = head letter
Output:
*Main> index "abcde"
17902
*Main> index "abcdefghijklmnopqrstuvwxyz"
67108863
(0.39 secs, 77666880 bytes)