问题
This is a hackerrank exercise, and although the problem itself is solved, my solution is apparently not efficient enough, so on most test cases I'm getting timeouts. Here's the problem:
We're going to make our own Contacts application! The application must perform two types of operations:
add name
, wherename
is a string denoting a contact name. This must store as a new contact in the application.find partial
, wherepartial
is a string denoting a partial name to search the application for. It must count the number of contacts starting withpartial
and print the count on a new line. Givenn
sequential add and find operations, perform each operation in order.
I'm using Tries to make it work, here's the code:
import re
def add_contact(dictionary, contact):
_end = '_end_'
current_dict = dictionary
for letter in contact:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return(dictionary)
def find_contact(dictionary, contact):
p = re.compile('_end_')
current_dict = dictionary
for letter in contact:
if letter in current_dict:
current_dict = current_dict[letter]
else:
return(0)
count = int(len(p.findall(str(current_dict))) / 2)
re.purge()
return(count)
n = int(input().strip())
contacts = {}
for a0 in range(n):
op, contact = input().strip().split(' ')
if op == "add":
contacts = add_contact(contacts, contact)
if op == "find":
print(find_contact(contacts, contact))
Because the problem requires not returning whether partial
is a match or not, but instead counting all of the entries that match it, I couldn't find any other way but cast the nested dictionaries to a string and then count all of the _end_
s, which I'm using to denote stored strings. This, it would seem, is the culprit, but I cannot find any better way to do the searching. How do I make this work faster? Thanks in advance.
UPD: I have added a results counter that actually parses the tree, but the code is still too slow for the online checker. Any thoughts?
def find_contact(dictionary, contact):
current_dict = dictionary
count = 0
for letter in contact:
if letter in current_dict:
current_dict = current_dict[letter]
else:
return(0)
else:
return(words_counter(count, current_dict))
def words_counter(count, node):
live_count = count
live_node = node
for value in live_node.values():
if value == '_end_':
live_count += 1
if type(value) == type(dict()):
live_count = words_counter(live_count, value)
return(live_count)
回答1:
Ok, so, as it turns out, using nested dicts is not a good idea in general, because hackerrank will shove 100k strings into your program and then everything will slow to a crawl. So the problem wasn't in the parsing, it was in the storing before the parsing. Eventually I found this blogpost, their solution passes the challenge 100%. Here's the code in full:
class Node:
def __init__(self):
self.count = 1
self.children = {}
trie = Node()
def add(node, name):
for letter in name:
sub = node.children.get(letter)
if sub:
sub.count += 1
else:
sub = node.children[letter] = Node()
node = sub
def find(node, data):
for letter in data:
sub = node.children.get(letter)
if not sub:
return 0
node = sub
return node.count
if __name__ == '__main__':
n = int(input().strip())
for _ in range(n):
op, param = input().split()
if op == 'add':
add(trie, param)
else:
print(find(trie, param))
来源:https://stackoverflow.com/questions/46961262/efficient-partial-search-of-a-trie-in-python