string-algorithm

Find all concatenations of two string in a huge set

我们两清 提交于 2019-12-13 13:28:02
问题 Given a set of 50k strings, I need to find all pairs (s, t) , such that s , t and s + t are all contained in this set. What I've tried , there's an additional constraint: s.length() >= 4 && t.length() >= 4 . This makes it possible to group the strings by length 4 prefixes and, separately, suffixes. Then for every string composed of length at least 8, I look up the set of candidates for s using the first four characters of composed and the set of candidates for t using its last four characters

Inplace string replacement in C

核能气质少年 提交于 2019-12-12 03:17:54
问题 Write a function void inplace(char *str, const char pattern, const char* replacement, size_t mlen) Input: str : a string ending with \0 . the input indicates that we need an inplace algorithm. pattern : a letter. replacement : a string. mlen : the size of the memory holds the string str starts from the beginning of the memory and that mlen should be larger than strlen(str) The final result is still pointed by str . Note that all occurrence of pattern should be replaced. For example, helelo\0.

Python coding relating to function any and “more than once” keyword

荒凉一梦 提交于 2019-12-08 13:39:56
问题 I have this simple piece of code that tells me if a word in a given list appears in an article: if not any(word in article.text for word in keywords): print("Skipping article as there is no matching keyword\n") What I need is if at least 3 words in the "keywords" list appear in the article - if they don't then it should skip the article. Is there an easy way to do this? I can't seem to find anything. 回答1: If the set of keywords is large enough and the string being searched is long enough that

Faster Aho-Corasick PHP implementation

纵然是瞬间 提交于 2019-12-04 23:15:46
问题 Is there a working implementation of Aho–Corasick in PHP? There is one Aho-Corasick string matching in PHP mentioned on the Wikipedia article: <?php /* This class performs a multiple pattern matching by using the Aho-Corasick algorythm, which scans text and matches all patterns "at once". This class can: - find if any of the patterns occours inside the text - find all occourrences of the patterns inside the text - substitute all occourrences of the patterns with a specified string (empty as

Find the words in a long stream of characters. Auto-tokenize

岁酱吖の 提交于 2019-12-04 08:28:39
问题 How would you find the correct words in a long stream of characters? Input : "The revised report onthesyntactictheoriesofsequentialcontrolandstate" Google's Output: "The revised report on syntactic theories sequential controlandstate" (which is close enough considering the time that they produced the output) How do you think Google does it? How would you increase the accuracy? 回答1: I would try a recursive algorithm like this: Try inserting a space at each position. If the left part is a word,

Find the words in a long stream of characters. Auto-tokenize

五迷三道 提交于 2019-12-02 23:28:19
How would you find the correct words in a long stream of characters? Input : "The revised report onthesyntactictheoriesofsequentialcontrolandstate" Google's Output: "The revised report on syntactic theories sequential controlandstate" (which is close enough considering the time that they produced the output) How do you think Google does it? How would you increase the accuracy? I would try a recursive algorithm like this: Try inserting a space at each position. If the left part is a word, then recur on the right part. Count the number of valid words / number of total words in all the final