Given a word and a text, return the count of the occurrences of anagrams of the word in the text. For eg. word is “for” and the text is “forxxorfxdofr”, anagrams of “for” wi
Essentially you can slide a window of the length of your word over your input and keep a count of how many of each letter are in the window. When the letter counts in your sliding window match the letter counts of your word, you have a match.
Let your word length be n
, and your current position be curr
. Create an array, or vector
, windCounts
of length 26. The entry windCounts[i]
stores the number of occurrences of the ith letter of the alphabet seen from position curr - n - 1
to curr
.
What you do is you advance curr
, and keep your array windCounts
up to date, by decrementing the letter that has dropped out of the back of the sliding window, and incrementing the letter count that has appeared in the front of the sliding window. (Obviously until curr
> n
, you only increment, you just build up your sliding window to the length of your word.)
In C++, you can use a vector
for the counts of letters in your word, and for the counts of letters in your sliding window and simply use vector::operator==
to do the equality.
Edit: the algorithm is O(N)
, where N
is the length of the text to search. This can be seen from the code below where the loop body is executed for each letter that you slide the window.
#include <string>
#include <vector>
#include <algorithm> // for_each
using std::string;
using std::vector;
#include <iostream>
int main(int argc, char* argv[])
{
const string text = "forxxorfxdofr";
const string word = "for";
// Counts of letters in word
vector<int> wordCounts(256); // optimization: cut down from 256 to 26
std::for_each(word.begin(), word.end(),
[&] (char c) { wordCounts[c]++; } );
// Current position of end of sliding window
string::const_iterator curr = text.begin() + word.size();
// Initial sliding window counts
vector<int> windCounts(256);
std::for_each(text.begin(), curr,
[&] (char c) { windCounts[c]++; } );
// Run sliding window over text
int numMatches = 0;
while (1) {
numMatches += wordCounts == windCounts;
if (curr == text.end()) {
break;
}
windCounts[*(curr - word.size())]--;
windCounts[*curr]++;
++curr;
}
std::cout << numMatches << "\n";
return 0;
}
I have taken two string namely str and occ. Str is the original strin and occ is the sting for which we have to find out the count. Using strncpy function I have copied the length of occ i.e. n chars into a temp array and then checked whether it is a permutation of the occ string or not.
#include<iostream.h>
#include<conio.h>
#include<string.h>
int permutate(char str1[],char str2[]);
int permutate(char str1[],char str2[]) {
int c[256]={0},i,j;
for(i=0;i<strlen(str1);i++)
c[str1[i]]++;
for(i=0;i<strlen(str2);i++) {
c[str2[i]]--;
if(c[str2[i]]<0)
return 1; //not a permutation
}
return 0; //permutation
}
int main() {
//enter code here
char str[]="forxxorfxdofr",occ[]="for",temp[10];
int n,i,x,t=0;
n=strlen(occ);
for(i=0;i<strlen(str);i++) {
strncpy(temp,str+i,n); //copy the n char from str to temp
x=permutate(temp,occ);
if(x==0) //if string is a permutation
t++;
}
cout<<"Count = " << t;
return 0;
}
o(n) solution in Python
def check(s1,s2):
function takes in s1 as the text and s2 as the text to be checked from here for
c=0
n=len(s2)
ck=sorted(s2)
mk=''.join(ck)
this loop will pick from s till the length of s2 that is 'for'
for i,item in enumerate(s1):
if s1[i] in mk:
p=s1[i:i+n]
jk=sorted(p)
er=''.join(jk)
now just comparing the both sorted strings if they are equal then they were anagram
if er == mk:
c+=1
return c
You can simply look for the character count.
Say for example that you're looking for anagramms of look
. So, you're looking for:
Simply process the first 4 letters, store the counts. Check whether you have a match. Add the next character (increment), remove the old character (decrement). Check again. And so on...
TooTone's O(n) solution suffers from having to compare two 256-element vectors for each character of the input text. This can be avoided by tracking the number of positions at which the two vectors differ, and registering a match when this number goes to zero. In fact, we don't even need to store two different vectors at all, since we can just store one vector containing their difference.
Here's my version implementing these optimizations. It's written in plain old C, but should work under C++ with appropriate adjustments:
#include <stdio.h>
#include <limits.h> /* for UCHAR_MAX (usually 255) */
int find_anagrams (char *word, char *text) {
int len = 0; /* length of search word */
int bin[UCHAR_MAX+1]; /* excess count of each char in last len chars of text */
int mismatch = 0; /* count of nonzero values in bins[] */
int found = 0; /* number of anagrams found */
int i; /* generic loop counter */
/* initialize bins */
for (i = 0; i <= UCHAR_MAX; i++) bin[i] = 0;
for (i = 0; word[i] != '\0'; i++) {
unsigned char c = (unsigned char) word[i];
if (bin[c] == 0) mismatch++;
bin[c]--;
len++; /* who needs strlen()? */
}
/* iterate through text */
for (i = 0; text[i] != '\0'; i++) {
/* add next char in text to bins, keep track of mismatch count */
unsigned char c = (unsigned char) text[i];
if (bin[c] == 0) mismatch++;
if (bin[c] == -1) mismatch--;
bin[c]++;
/* remove len-th previous char from bins, keep track of mismatch count */
if (i >= len) {
unsigned char d = (unsigned char) text[i - len];
if (bin[d] == 0) mismatch++;
if (bin[d] == 1) mismatch--;
bin[d]--;
}
/* if mismatch count is zero, we've found an anagram */
if (mismatch == 0) {
found++;
#ifdef DEBUG
/* optional: print each anagram found */
printf("Anagram found at position %d: \"", i-len+1);
fwrite(text+i-len+1, 1, len, stdout);
printf("\"\n");
#endif
}
}
return found;
}
int main (int argc, char *argv[]) {
if (argc == 3) {
int n = find_anagrams(argv[1], argv[2]);
printf("Found %d anagrams of \"%s\" in \"%s\".\n", n, argv[1], argv[2]);
return 0;
} else {
fprintf(stderr, "Usage: %s <word> <text>\n", (argc ? argv[0] : "countanagrams"));
return 1;
}
}