string-search

what's the fastest way to scan a very large file in java?

此生再无相见时 提交于 2019-12-03 14:59:34
Imagine I have a very large text file. Performance really matters. All I want to do is to scan it to look for a certain string. Maybe I want to count how many I have of those, but it really is not the point. The point is: what's the fastest way ? I don't care about maintainance it needs to be fast. Fast is key. For a one off search use a Scanner , as suggested here A simple technique that could well be considerably faster than indexOf() is to use a Scanner, with the method findWithinHorizon(). If you use a constructor that takes a File object, Scanner will internally make a FileChannel to read

String searching algorithms

倾然丶 夕夏残阳落幕 提交于 2019-12-03 07:29:14
For the two string searching algorithms: KMP and suffix tree, which is preferred in which cases? Give some practical examples. A suffix tree is better if you will have to answer a lot of queries such as "is the needle present in the haystack?". KMP is better if you only have to search for one string in another single string, and not have to do it a lot of times. A suffix tree is a much more general data structure, so you can do a lot more with it. See what you can do with it here . KMP is useful for finding if a string is a substring in another string. You might also want to check out other

What are the shift rules for Boyer–Moore string search algorithm?

雨燕双飞 提交于 2019-12-03 07:02:11
I have been trying to understand shift rules in Boyer–Moore string search algorithm but haven't understood them. I read here on wikipedia but that is too complex ! It will be of great help if someone lists the rule in a simple manner. In the Boyer-Moore algorithm, you start comparing pattern characters to text characters from the end of the pattern. If you find a mismatch, you have a configuration of the type ....xyzabc.... <-text ....uabc <- pattern ^ mismatch Now the bad character shift means to shift the pattern so that the text character of the mismatch is aligned to the last occurrence of

What are the main differences between the Knuth-Morris-Pratt and Boyer-Moore search algorithms?

雨燕双飞 提交于 2019-12-02 14:03:20
What are the main differences between the Knuth-Morris-Pratt search algorithm and the Boyer-Moore search algorithm? I know KMP searches for Y in X, trying to define a pattern in Y, and saves the pattern in a vector. I also know that BM works better for small words, like DNA (ACTG). What are the main differences in how they work? Which one is faster? Which one is less computer-greedy? In which cases? Moore's UTexas webpage walks through both algorithms in a step-by-step fashion (he also provides various technical sources): Knuth-Morris-Pratt Boyer-Moore According to the man himself, The classic

php - Is strpos the fastest way to search for a string in a large body of text?

岁酱吖の 提交于 2019-11-30 17:23:09
if (strpos(htmlentities($storage->getMessage($i)),'chocolate')) Hi, I'm using gmail oauth access to find specific text strings in email addresses. Is there a way to find text instances quicker and more efficiently than using strpos in the above code? Should I be using a hash technique? According to the PHP manual, yes- strpos() is the quickest way to determine if one string contains another. Note: If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead. This is quoted time and again in any php.net article

What is the Time Complexity, Space complexity and Algorithm for strstr() function in C++?

旧街凉风 提交于 2019-11-30 16:58:21
问题 I was curious about the cost of using the default, old fashioned strstr() function in C++. What is its Time and Space complexity? Which algorithm does it use? We have other algorithms with below Worst Case Time and Space complexity : Let n = length of string, m = length of pattern Knuth-Morris-Pratt Algorithm : Time = O(n+m), Space = O(m) Rabin-Karp Algorithm : Time = O(n*m), Space = O(p) (p = p patterns of combined length m) Boyer-Moore Algorithm : Time = O(n*m), Space = O(S) (S = size of

String searching algorithms in Java

假如想象 提交于 2019-11-30 10:35:03
I am doing string matching with big amount of data. EDIT: I am matching words contained in a big list with some ontology text files. I take each file from ontology, and search for a match between the third String of each file line and any word from the list. I made a mistake in overseeing the fact that what I need to do is not pure matching (results are poor), but I need some looser matching function that will also return results when the string is contained inside another string. I did this with a Radix Trie ; it was very fast and works nice, but now I guess my work is useless because a trie

For string, find and replace

∥☆過路亽.° 提交于 2019-11-30 07:19:36
问题 Finding some text and replacing it with new text within a C string can be a little trickier than expected. I am searching for an algorithm which is fast, and that has a small time complexity. What should I use? 回答1: I can't help but wonder what algorithm strstr() implements. Given that these are fairly standard algorithms, it's entirely possible that a good implementation of strstr() uses one of them. However there's no guarantee that strstr() implements an optimised algorithm or that the

Fastest way to search in a string collection

笑着哭i 提交于 2019-11-30 06:12:00
问题 Problem: I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection. The search method will occur every time the user change the text of a TextBox and the result should be the strings that contain the text in TextBox . I don't have to change the list, just pull the results and put them in a ListBox . What I've tried so far: I tried with two different collections/containers, which I'm dumping the string

Case insensitive string search in golang

僤鯓⒐⒋嵵緔 提交于 2019-11-30 03:00:26
How do I search through a file for a word in a case insensitive manner? For example If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match. strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info. http://play.golang.org/p/KDdIi8c3Ar package main import ( "fmt" "strings" ) func main() { fmt.Println(strings.EqualFold("HELLO", "hello")) fmt.Println(strings.EqualFold("ÑOÑO", "ñoño")) } Both return true. Presumably the important