Using Python, find anagrams for a list of words

独自空忆成欢 提交于 2019-11-26 22:40:29

问题


If I have a list of strings for example:

["car", "tree", "boy", "girl", "arc"...]

What should I do in order to find anagrams in that list? For example (car, arc). I tried using for loop for each string and I used if in order to ignore strings in different lengths but I can't get the right result.

How can I go over each letter in the string and compare it to others in the list in different order?

I have read several similar questions, but the answers were too advanced. I can't import anything and I can only use basic functions.


回答1:


In order to do this for 2 strings you can do this:

def isAnagram(str1, str2):
    str1_list = list(str1)
    str1_list.sort()
    str2_list = list(str2)
    str2_list.sort()

    return (str1_list == str2_list)

As for the iteration on the list, it is pretty straight forward




回答2:


Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.

from collections import defaultdict

def load_words(filename='/usr/share/dict/american-english'):
    with open(filename) as f:
        for word in f:
            yield word.rstrip()

def get_anagrams(source):
    d = defaultdict(list)
    for word in source:
        key = "".join(sorted(word))
        d[key].append(word)
    return d

def print_anagrams(word_source):
    d = get_anagrams(word_source)
    for key, anagrams in d.iteritems():
        if len(anagrams) > 1:
            print(key, anagrams)

word_source = load_words()
print_anagrams(word_source)

Or:

word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)



回答3:


One solution is to sort the word you're searching anagrams for (for example using sorted), sort the alternative and compare those.

So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca'], your code could look like this:

word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']

for alt in alternatives:
    if word == sorted(alt):
        print alt



回答4:


Sort each element then look for duplicates. There's a built-in function for sorting so you do not need to import anything




回答5:


There are multiple solutions to this problem:

  1. Classic approach

    First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for collections.Counter data structure (see docs). The algorithms is as follows:

    • Build a dictionary where keys would be histograms and values would be lists of words that have this histogram.
    • For each word build it's histogram and add it to the list that corresponds to this histogram.
    • Output list of dictionary values.

    Here is the code:

    from collections import Counter, defaultdict
    
    def anagram(words):
        anagrams = defaultdict(list)
        for word in words:
            histogram = tuple(Counter(word).items()) # build a hashable histogram
            anagrams[histogram].append(word)
        return list(anagrams.values())
    
    keywords = ("hi", "hello", "bye", "helol", "abc", "cab", 
                    "bac", "silenced", "licensed", "declines")
    
    print(anagram(keywords))
    

    Note that constructing Counter is O(l), while sorting each word is O(n*log(l)) where l is the length of the word.

  2. Solving anagrams using prime numbers

    This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.




回答6:


def findanagranfromlistofwords(li):
    dict = {}
    index=0
    for i in range(0,len(li)):
        originalfirst = li[index]
        sortedfirst = ''.join(sorted(str(li[index])))
        for j in range(index+1,len(li)):
            next = ''.join(sorted(str(li[j])))
            print next
            if sortedfirst == next:
                dict.update({originalfirst:li[j]})
                print "dict = ",dict
        index+=1

    print dict

findanagranfromlistofwords(["car", "tree", "boy", "girl", "arc"])



回答7:


Most of previous answers are correct, here is another way to compare two strings. The main benefit of using this strategy versus sort is space/time complexity which is n log of n.

1.Check the length of string

2.Build frequency Dictionary and compare if they both match then we have successfully identified anagram words

def char_frequency(word):
    frequency  = {}
    for char in word:
        #if character  is in frequency then increment the value
        if char in frequency:
            frequency[char] += 1
        #else add character and set it to 1
        else:
            frequency[char] = 1
    return frequency 


a_word ='google'
b_word ='ooggle'
#check length of the words 
if (len(a_word) != len(b_word)):
   print ("not anagram")
else:
    #here we check the frequecy to see if we get the same
    if ( char_frequency(a_word) == char_frequency(b_word)):
        print("found anagram")
    else:
        print("no anagram")



回答8:


Since you can't import anything, here are two different approaches including the for loop you asked for.

Approach 1: For Loops and Inbuilt Sorted Function

word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

Approach 2: Dictionaries

def freq(word):
    freq_dict = {}
    for char in word:
        freq_dict[char] = freq_dict.get(char, 0) + 1
    return freq_dict

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (freq(word_1) == freq(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

If you want these approaches explained in more detail, here is an article.




回答9:


Solution in python can be as below:

class Word:
    def __init__(self, data, index):
        self.data = data
        self.index = index

def printAnagrams(arr):
    dupArray = []
    size = len(arr)

    for i in range(size):
        dupArray.append(Word(arr[i], i))

    for i in range(size):
        dupArray[i].data = ''.join(sorted(dupArray[i].data))

    dupArray = sorted(dupArray, key=lambda x: x.data)

    for i in range(size):
        print arr[dupArray[i].index]

def main():
    arr = ["dog", "act", "cat", "god", "tac"]

    printAnagrams(arr)

if __name__== '__main__':
    main()
  1. First create a duplicate list of same words with indexes representing their position indexes.
  2. Then sort the individual strings of the duplicate list
  3. Then sort the duplicate list itself based on strings.
  4. Finally print the original list with indexes used from duplicate array.

The time complexity of above is O(NMLogN + NMLogM) = O(NMlogN)




回答10:


I'm using a dictionary to store each character of string one by one. Then iterate through second string and find the character in the dictionary, if it's present decrease the count of the corresponding key from dictionary.

class Anagram:

    dict = {}

    def __init__(self):
        Anagram.dict = {}

    def is_anagram(self,s1, s2):
        print '***** starting *****'

        print '***** convert input strings to lowercase'
        s1 = s1.lower()
        s2 = s2.lower()

        for i in s1:
           if i not in Anagram.dict:
              Anagram.dict[i] = 1
           else:
              Anagram.dict[i] += 1

        print Anagram.dict

        for i in s2:
           if i not in Anagram.dict:
              return false
           else:
              Anagram.dict[i] -= 1

        print Anagram.dict

       for i in Anagram.dict.keys():
          if Anagram.dict.get(i) == 0:
              del Anagram.dict[i]

       if len(Anagram.dict) == 0:
         print Anagram.dict
         return True
       else:
         return False



回答11:


import collections

def find_anagrams(x):
    anagrams = [''.join(sorted(list(i))) for i in x]
    anagrams_counts = [item for item, count in collections.Counter(anagrams).items() if count > 1]
    return [i for i in x if ''.join(sorted(list(i))) in anagrams_counts]



回答12:


Simple Solution in Python:

def anagram(s1,s2):

    # Remove spaces and lowercase letters
    s1 = s1.replace(' ','').lower()
    s2 = s2.replace(' ','').lower()

    # Return sorted match.
    return sorted(s1) == sorted(s2)



回答13:


This one is gonna help you:

Assuming input is given as comma separated strings

console input: abc,bac,car,rac,pqr,acb,acr,abc

in_list = list()
in_list = map(str, raw_input("Enter strings seperated by comma").split(','))
list_anagram = list()

for i in range(0, len(in_list) - 1):
    if sorted(in_list[i]) not in list_anagram:
        for j in range(i + 1, len(in_list)):
            isanagram = (sorted(in_list[i]) == sorted(in_list[j]))
            if isanagram:
                list_anagram.append(sorted(in_list[i]))
                print in_list[i], 'isanagram'
                break



回答14:


This works fine:


def find_ana(l):
    a=[]
    for i in range(len(l)):
        for j in range(len(l)): 
            if (l[i]!=l[j]) and (sorted(l[i])==sorted(l[j])):
                a.append(l[i])
                a.append(l[j])

    return list(set(a))




回答15:


Simply use the Counter method available in Python3 collections package.

str1="abc"
str2="cab"

Counter(str1)==Counter(str2)
# returns True i.e both Strings are anagrams of each other.



回答16:


A set is an appropriate data structure for the output, since you presumably don't want redundancy in the output. A dictionary is ideal for looking up if a particular sequence of letters has been previously observed, and what word it originally came from. Taking advantage of the fact that we can add the same item to a set multiple times without expanding the set lets us get away with one for loop.

def return_anagrams(word_list):
    d = {}
    out = set()
    for word in word_list:
        s = ''.join(sorted(word))
        try:
            out.add(d[s])
            out.add(word)
        except:
            d[s] = word
    return out

A faster way of doing it takes advantage of the commutative property of addition:

import numpy as np

def vector_anagram(l):
    d, out = dict(), set()
    for word in l:
        s = np.zeros(26, dtype=int)
        for c in word:
            s[ord(c)-97] += 1
        s = tuple(s)
        try:
            out.add(d[s])
            out.add(word)
        except:
            d[s] = word
    return out



回答17:


  1. Calculate each word length.
  2. Calculate each word ascii character sum.
  3. Sort each word characters by their ascii values and set ordered word.
  4. Group words according to their lengths.
  5. For each group regroup list according to their ascii character sum.
  6. For each small list check only words ordered. If ordered words same these words anagram.

Here we have 1000.000 words list. 1000.000 words

    namespace WindowsFormsApplication2
    {
        public class WordDef
        {
            public string Word { get; set; }
            public int WordSum { get; set; }
            public int Length { get; set; }       
            public string AnagramWord { get; set; }
            public string Ordered { get; set; }
            public int GetAsciiSum(string word)
            {
                int sum = 0;
                foreach (char c in word)
                {
                    sum += (int)c;
                }
                return sum;
            }
        }
    }

    using System;
    using System.Collections.Concurrent;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;
    using System.Net;
    using System.Text;
    using System.Threading.Tasks;
    using System.Windows.Forms;

    namespace WindowsFormsApplication2
    {
        public partial class AngramTestForm : Form
        {
            private ConcurrentBag<string> m_Words;

            private ConcurrentBag<string> m_CacheWords;

            private ConcurrentBag<WordDef> m_Anagramlist;
            public AngramTestForm()
            {
                InitializeComponent();
                m_CacheWords = new ConcurrentBag<string>();
            }

            private void button1_Click(object sender, EventArgs e)
            {
                m_Words = null;
                m_Anagramlist = null;

                m_Words = new ConcurrentBag<string>();
                m_Anagramlist = new ConcurrentBag<WordDef>();

                if (m_CacheWords.Count == 0)
                {
                    foreach (var s in GetWords())
                    {
                        m_CacheWords.Add(s);
                    }
                }

                m_Words = m_CacheWords;

                Stopwatch sw = new Stopwatch();

                sw.Start();

                //DirectCalculation();

                AsciiCalculation();

                sw.Stop();

                Console.WriteLine("The End! {0}", sw.ElapsedMilliseconds);

                this.Invoke((MethodInvoker)delegate
                {
                    lbResult.Text = string.Concat(sw.ElapsedMilliseconds.ToString(), " Miliseconds");
                });

                StringBuilder sb = new StringBuilder();
                foreach (var w in m_Anagramlist)
                {
                    if (w != null)
                    {
                        sb.Append(string.Concat(w.Word, " - ", w.AnagramWord, Environment.NewLine));
                    }
                }

                txResult.Text = sb.ToString();
            }

            private void DirectCalculation()
            {
                List<WordDef> wordDef = new List<WordDef>();

                foreach (var w in m_Words)
                {
                    WordDef wd = new WordDef();
                    wd.Word = w;
                    wd.WordSum = wd.GetAsciiSum(w);
                    wd.Length = w.Length;
                    wd.Ordered = String.Concat(w.OrderBy(c => c));

                    wordDef.Add(wd);
                }

                foreach (var w in wordDef)
                {
                    foreach (var t in wordDef)
                    {
                        if (w.Word != t.Word)
                        {
                            if (w.Ordered == t.Ordered)
                            {
                                t.AnagramWord = w.Word;
                                m_Anagramlist.Add(new WordDef() { Word = w.Word, AnagramWord = t.Word });
                            }
                        }
                    }
                }
            }

            private void AsciiCalculation()
            {
                ConcurrentBag<WordDef> wordDef = new ConcurrentBag<WordDef>();

                Parallel.ForEach(m_Words, w =>
                    {
                        WordDef wd = new WordDef();
                        wd.Word = w;
                        wd.WordSum = wd.GetAsciiSum(w);
                        wd.Length = w.Length;
                        wd.Ordered = String.Concat(w.OrderBy(c => c));

                        wordDef.Add(wd);                    
                    });

                var tempWordByLength = from w in wordDef
                                       group w by w.Length into newGroup
                                       orderby newGroup.Key
                                       select newGroup;

                foreach (var wList in tempWordByLength)
                {
                    List<WordDef> wd = wList.ToList<WordDef>();

                    var tempWordsBySum = from w in wd
                                         group w by w.WordSum into newGroup
                                         orderby newGroup.Key
                                         select newGroup;

                    Parallel.ForEach(tempWordsBySum, ws =>
                        {
                            List<WordDef> we = ws.ToList<WordDef>();

                            if (we.Count > 1)
                            {
                                CheckCandidates(we);
                            }
                        });
                }
            }

            private void CheckCandidates(List<WordDef> we)
            {
                for (int i = 0; i < we.Count; i++)
                {
                    for (int j = i + 1; j < we.Count; j++)
                    {
                        if (we[i].Word != we[j].Word)
                        {
                            if (we[i].Ordered == we[j].Ordered)
                            {
                                we[j].AnagramWord = we[i].Word;
                                m_Anagramlist.Add(new WordDef() { Word = we[i].Word, AnagramWord = we[j].Word });
                            }
                        }
                    }
                }
            }

            private static string[] GetWords()
            {
                string htmlCode = string.Empty;

                using (WebClient client = new WebClient())
                {
                    htmlCode = client.DownloadString("https://raw.githubusercontent.com/danielmiessler/SecLists/master/Passwords/10_million_password_list_top_1000000.txt");
                }

                string[] words = htmlCode.Split(new string[] { "\n" }, StringSplitOptions.RemoveEmptyEntries);

                return words;
            }
        }
    }



回答18:


here is the impressive solution.

funct alphabet_count_mapper:

for each word in the file/list

1.create a dictionary of alphabets/characters with initial count as 0.

2.keep count of all the alphabets in the word and increment the count in the above alphabet dict.

3.create alphabet count dict and return the tuple of the values of alphabet dict.

funct anagram_counter:

1.create a dictionary with alphabet count tuple as key and the count of the number of occurences against it.

2.iterate over the above dict and if the value > 1, add the value to the anagram count.

    import sys
    words_count_map_dict = {}
    fobj = open(sys.argv[1],"r")
    words = fobj.read().split('\n')[:-1]

    def alphabet_count_mapper(word):
        alpha_count_dict = dict(zip('abcdefghijklmnopqrstuvwxyz',[0]*26))
        for alpha in word:
            if alpha in alpha_count_dict.keys():
                alpha_count_dict[alpha] += 1
            else:
                alpha_count_dict.update(dict(alpha=0))
        return tuple(alpha_count_dict.values())

    def anagram_counter(words):
        anagram_count = 0
        for word in words:
            temp_mapper = alphabet_count_mapper(word)
            if temp_mapper in words_count_map_dict.keys():
                words_count_map_dict[temp_mapper] += 1
            else:
                words_count_map_dict.update({temp_mapper:1})
        for val in words_count_map_dict.values():
            if val > 1:
                anagram_count += val
        return anagram_count


    print anagram_counter(words)

run it with file path as command line argument




回答19:


You convert each of the character in a word into a number (by ord() function), add them up for the word. If two words have the same sum, then they are anagrams. Then filter for the sums that occur more than twice in the list.

def sumLet(w):
    return sum([ord(c) for c in w])

def find_anagrams(l):
    num_l = map(sumLet,l)
    return [l[i] for i,num in enumerate(num_l) if num_l.count(num) > 1]



回答20:


>>> words = ["car", "race", "rac", "ecar", "me", "em"]
>>> anagrams = {}
... for word in words:
...     reverse_word=word[::-1]
...     if reverse_word in words:
...         anagrams[word] = (words.pop(words.index(reverse_word)))
>>> anagrams
20: {'car': 'rac', 'me': 'em', 'race': 'ecar'}

Logic:

  1. Start from first word and reverse the word.
  2. Check the reversed word is present in the list.
  3. If present, find the index and pop the item and store it in the dictionary, word as key and reversed word as value.



回答21:


If you want a solution in java,

public List<String> findAnagrams(List<String> dictionary) {

    // TODO do null check and other basic validations.
    Map<String, List<String>> wordMap = new HashMap<String, List<String>>();

    for(String word : dictionary) {

        // ignore if word is null
        char[] tempWord = word.tocharArray();
        Arrays.sort(tempWord);
        String newWord = new String(tempWord);

        if(wordMap.containsKey(newWord)) {
            wordMap.put(newWord, wordMap.get(word).add(word));
        } else {
            wordMap.put(newWord, new ArrayList<>() {word});
        }

    }

    List<String> anagrams = new ArrayList<>();

    for(String key : wordMap.keySet()) {

        if(wordMap.get(key).size() > 1) {
            anagrams.addAll(wordMap.get(key));
        }

    }

    return anagrams;
}


来源:https://stackoverflow.com/questions/8286554/using-python-find-anagrams-for-a-list-of-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!