问题
A word is an anagram if the letters in that word can be re-arranged to form a different word.
Task:
- The shortest source code by character count to find all sets of anagrams given a word list.
- Spaces and new lines should be counted as characters
Use the code ruler
---------10--------20--------30--------40--------50--------60--------70--------80--------90--------100-------110-------120
Input:
a list of words from stdin with each word separated by a new line.
e.g.
A
A's
AOL
AOL's
Aachen
Aachen's
Aaliyah
Aaliyah's
Aaron
Aaron's
Abbas
Abbasid
Abbasid's
Output:
All sets of anagrams, with each set separated by a separate line.
Example run:
./anagram < words
marcos caroms macros
lump's plum's
dewar's wader's
postman tampons
dent tend
macho mocha
stoker's stroke's
hops posh shop
chasity scythia
...
I have a 149 char perl solution which I'll post as soon as a few more people post :)
Have fun!
EDIT: Clarifications
- Assume anagrams are case insensitive (i.e. upper and lower case letters are equivalent)
- Only sets with more than 1 item should be printed
- Each set of anagrams should only be printed once
- Each word in an anagram set should only occur once
EDIT2: More Clarifications
- If two words differ only in capitalization, they should be collapsed into the same word, and it's up to you to decide which capitalization scheme to use for the collapsed word
- sets of words only have to end in a new line, as long as each word is separated in some way, e.g. comma separated, or space separated is valid. I understand some languages have quick array printing methods built in so this should allow you to take advantage of that if it doesn't output space separated arrays.
回答1:
Powershell, 104 97 91 86 83 chars
$k=@{};$input|%{$k["$([char[]]$_|%{$_+0}|sort)"]+=@($_)}
$k.Values|?{$_[1]}|%{"$_"}
Update for the new requirement (+8 chars):
To exclude the words that only differ in capitalization, we could just remove the duplicates (case-insensitvely) from the input list, i.e. $input|sort -u
where -u
stands for -unique
. sort
is case-insenstive by default:
$k=@{};$input|sort -u|%{$k["$([char[]]$_|%{$_+0}|sort)"]+=@($_)}
$k.Values|?{$_[1]}|%{"$_"}
Explanation of the [char[]]$_|%{$_+0}|sort
-part
It's a key for the hashtable entry under which anagrams of a word are stored. My initial solution was: $_.ToLower().ToCharArray()|sort
. Then I discovered I didn't need ToLower()
for the key, as hashtable lookups are case-insensitive.
[char[]]$_|sort
would be ideal, but sorting of the chars for the key needs to be case-insensitive (otherwise Cab
and abc
would be stored under different keys). Unfortunately, sort
is not case-insenstive for chars (only for strings).
What we need is [string[]][char[]]$_|sort
, but I found a shorter way of converting each char to string, which is to concat something else to it, in this case an integer 0
, hence [char[]]$_|%{$_+0}|sort
. This doesn't affect the sorting order, and the actual key ends up being something like: d0 o0 r0 w0
. It's not pretty, but it does the job :)
回答2:
Perl, 59 characters
chop,$_{join'',sort split//,lc}.="$_ "for<>;/ ./&&say for%_
Note that this requires Perl 5.10 (for the say
function).
回答3:
Haskell, 147 chars
prior sizes: 150 159 chars
import Char
import List
x=sort.map toLower
g&a=g(x a).x
main=interact$unlines.map unwords.filter((>1).length).groupBy((==)&).sortBy(compare&).lines
This version, at 165 chars satisifies the new, clarified rules:
import Char
import List
y=map toLower
x=sort.y
g&f=(.f).g.f
w[_]="";w a=show a++"\n"
main=interact$concatMap(w.nubBy((==)&y)).groupBy((==)&x).sortBy(compare&x).lines
This version handles:
- Words in the input that differ only by case should only count as one word
- The output needs to be one anagram set per line, but extra punctuation is acceptable
回答4:
Ruby, 94 characters
h={};(h[$_.upcase.bytes.sort]||=[])<<$_ while gets&&chomp;h.each{|k,v|puts v.join' 'if v.at 1}
回答5:
Python, 167 characters, includes I/O
import sys
d={}
for l in sys.stdin.readlines():
l=l[:-1]
k=''.join(sorted(l)).lower()
d[k]=d.pop(k,[])+[l]
for k in d:
if len(d[k])>1: print(' '.join(d[k]))
Without the input code (i.e. if we assume the wordlist already in a list w
), it's only 134 characters:
d={}
for l in w:
l=l[:-1]
k=''.join(lower(sorted(l)))
d[k]=d.pop(k,[])+[l]
for k in d:
if len(d[k])>1: print(' '.join(d[k]))
回答6:
AWK - 119
{split(toupper($1),a,"");asort(a);s="";for(i=1;a[i];)s=a[i++]s;x[s]=x[s]$1" "}
END{for(i in x)if(x[i]~/ .* /)print x[i]}
AWK does not have a join
function like Python, or it could have been shorter...
It assumes uppercase and lowercase as different.
回答7:
C++, 542 chars
#include <iostream>
#include <map>
#include <vector>
#include <boost/algorithm/string.hpp>
#define ci const_iterator
int main(){using namespace std;typedef string s;typedef vector<s> vs;vs l;
copy(istream_iterator<s>(cin),istream_iterator<s>(),back_inserter(l));map<s, vs> r;
for (vs::ci i=l.begin(),e=l.end();i!=e;++i){s a=boost::to_lower_copy(*i);
sort(a.begin(),a.end());r[a].push_back(*i);}for (map<s,vs>::ci i=r.begin(),e=r.end();
i!=e;++i)if(i->second.size()>1)*copy(i->second.begin(),i->second.end(),
ostream_iterator<s>(cout," "))="\n";}
回答8:
Python, O(n^2)
import sys;
words=sys.stdin.readlines()
def s(x):return sorted(x.lower());
print '\n'.join([''.join([a.replace('\n',' ') for a in words if(s(a)==s(w))]) for w in words])
来源:https://stackoverflow.com/questions/2565912/code-golf-find-all-anagrams