suffix-tree

Match and replace emoticons in string - what is the most efficient way?

不羁的心 提交于 2019-11-28 14:00:32
Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this: $string = "Lorem ipsum :-) dolor :-| samet"; $emoticons = array( '[HAPPY]' => array(' :-) ', ' :) ', ' :o) '), //etc... '[SAD]' => array(' :-( ', ' :( ', ' :-| ') ); foreach ($emoticons as $emotion => $icons) { $string = str_replace($icons, " $emotion ", $string); } echo $string; Output: Lorem ipsum [HAPPY] dolor [SAD] samet so in principle this works. However, I have two questions: As you can see, I'm putting spaces around each emoticon in the array, such as ' :-) '

Longest palindrome in a string using suffix tree

我怕爱的太早我们不能终老 提交于 2019-11-28 02:49:11
I was trying to find the longest palindrome in a string. The brute force solution takes O(n^3) time. I read that there is a linear time algorithm for it using suffix trees. I am familiar with suffix trees and am comfortable building them. How do you use the built suffix tree to find the longest palindrome. I believe you need to proceed this way: Let y 1 y 2 ... y n be your string (where y i are letters). Create the generalized suffix tree of S f = y 1 y 2 ... y n $ and S r = y n y n - 1 ... y 1 # (reverse the letters and choose different ending characters for S f ($) and S r (#))... where S f

Finding the longest repeated substring

余生颓废 提交于 2019-11-27 23:07:08
What would be the best approach (performance-wise) in solving this problem? I was recommended to use suffix trees. Is this the best approach? Have a look at http://en.wikipedia.org/wiki/Suffix_array as well - they are quite space-efficient and have some reasonably programmable algorithms to produce them, such as "Simple Linear Work Suffix Array Construction" by Karkkainen and Sanders user1071840 Check out this link: http://introcs.cs.princeton.edu/java/42sort/LRS.java.html /************************************************************************* * Compilation: javac LRS.java * Execution: java

Effcient way to find longest duplicate string for Python (From Programming Pearls)

偶尔善良 提交于 2019-11-27 22:22:23
From Section 15.2 of Programming Pearls The C codes can be viewed here: http://www.cs.bell-labs.com/cm/cs/pearls/longdup.c When I implement it in Python using suffix-array: example = open("iliad10.txt").read() def comlen(p, q): i = 0 for x in zip(p, q): if x[0] == x[1]: i += 1 else: break return i suffix_list = [] example_len = len(example) idx = list(range(example_len)) idx.sort(cmp = lambda a, b: cmp(example[a:], example[b:])) #VERY VERY SLOW max_len = -1 for i in range(example_len - 1): this_len = comlen(example[idx[i]:], example[idx[i+1]:]) print this_len if this_len > max_len: max_len =

Effcient way to find longest duplicate string for Python (From Programming Pearls)

孤者浪人 提交于 2019-11-27 19:10:30
问题 From Section 15.2 of Programming Pearls The C codes can be viewed here: http://www.cs.bell-labs.com/cm/cs/pearls/longdup.c When I implement it in Python using suffix-array: example = open("iliad10.txt").read() def comlen(p, q): i = 0 for x in zip(p, q): if x[0] == x[1]: i += 1 else: break return i suffix_list = [] example_len = len(example) idx = list(range(example_len)) idx.sort(cmp = lambda a, b: cmp(example[a:], example[b:])) #VERY VERY SLOW max_len = -1 for i in range(example_len - 1):

suffix tree implementation in python [closed]

冷暖自知 提交于 2019-11-27 13:17:29
问题 Just wondering if you are aware of any C based extension in python that can help me construct suffix trees/arrays in linear time ? 回答1: You can checkout the following implementations. http://www.daimi.au.dk/~mailund/suffix_tree.html https://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/ https://github.com/kvh/Python-Suffix-Tree A guy improved (first one) and put it here. http://researchonsearch.blogspot.com/2010/05/suffix-tree-implementation-with-unicode.html All are C implementations. 来源:

Generalized Suffix Tree Java Implementation [closed]

一曲冷凌霜 提交于 2019-11-27 12:53:22
I am looking for a Java implementation of the Generalized Suffix Tree (GST) with the following features: After the creation of the GST from say 1000 strings I would like find out how many of these 1000 strings contains some other string 's'. The search must be quiet fast, as I need to apply the search on about 100'000 candidate strings of average length 10. Try The Semantic Discovery Toolkit . It has an implementation on text/src/java/org/sd/text/radixtree There is a Java implementation of a Non-General Suffix Tree is available at: http://illya-keeplearning.blogspot.com/2009/04/suffix-trees

Finding the longest repeated substring

半城伤御伤魂 提交于 2019-11-27 04:37:39
问题 What would be the best approach (performance-wise) in solving this problem? I was recommended to use suffix trees. Is this the best approach? 回答1: Have a look at http://en.wikipedia.org/wiki/Suffix_array as well - they are quite space-efficient and have some reasonably programmable algorithms to produce them, such as "Simple Linear Work Suffix Array Construction" by Karkkainen and Sanders 回答2: Check out this link: http://introcs.cs.princeton.edu/java/42sort/LRS.java.html /********************

python: library for generalized suffix trees [closed]

南笙酒味 提交于 2019-11-27 04:30:49
I need python library that can construct suffix trees and especially generalised suffix trees. Could you suggest me some libraries. Thanks. See the following libraries. suffixtree Python-Suffix-Tree SuffixTree SuffixTree (same name different project, supports generalized suffix trees) pysuffix (This is suffix arrays) 来源: https://stackoverflow.com/questions/9347078/python-library-for-generalized-suffix-trees

Longest palindrome in a string using suffix tree

我只是一个虾纸丫 提交于 2019-11-26 22:29:50
问题 I was trying to find the longest palindrome in a string. The brute force solution takes O(n^3) time. I read that there is a linear time algorithm for it using suffix trees. I am familiar with suffix trees and am comfortable building them. How do you use the built suffix tree to find the longest palindrome. 回答1: I believe you need to proceed this way: Let y 1 y 2 ... y n be your string (where y i are letters). Create the generalized suffix tree of S f = y 1 y 2 ... y n $ and S r = y n y n - 1