word-count

How do I count repeated words?

送分小仙女□ 提交于 2019-12-13 02:09:57
问题 Given a 1GB(very large) file containing words (some repeated), we need to read the file and output how many times each word is repeated. Please let me know if my solution is high performant or not. (For simplicity lets assume we have already captured the words in an arraylist<string> ) I think the big O(n) is "n". Am I correct?? public static void main(String[] args) { ArrayList al = new ArrayList(); al.add("math1"); al.add("raj1"); al.add("raj2"); al.add("math"); al.add("rj2"); al.add("math"

Hadoop wordcount unable to run - need help on decoding the hadoop error message

拈花ヽ惹草 提交于 2019-12-12 10:09:28
问题 I need some help on figuring out why my job failed. I built a single node cluster just to try it out. I followed the example here. Everything seems to be working correctly. I formatted the namenode and am able to connect to the jobtracker, datanode, and namenode via the web interface. I am able to start and stop all the hadoop services. However, when I try to run the wordcount example, I get this: Error initializing attempt_201105161023_0002_m_000011_0: java.io.IOException: Exception reading

Charaters/Bytes count and File size on Windows properties difference

风格不统一 提交于 2019-12-11 07:56:50
问题 I have a txt file got generated through PHP script. File character count is shown correctly as 3999 bytes/characters when i checked through my script. When i checked the same content by copy&paste into MS-Word, still it was showing 3999 characters/bytes(with spaces). However, when i looked at the windows property of the same txt file, it shows the size as 4.17 KB (4,278 bytes). I am just wondering, what could be the reason for such big margin of difference when i had looked at it. If someone

MapReduce - WritableComparables

余生颓废 提交于 2019-12-11 07:56:24
问题 I’m new to both Java and Hadoop. I’m trying a very simple program to get Frequent pairs. e.g. Input: My name is Foo. Foo is student. Intermediate Output: Map: (my, name): 1 (name ,is): 1 (is, Foo): 2 // (is, Foo) = (Foo, is) (is, student) So finally it should give frequent pair is (is ,Foo) . Pseudo code looks like this: Map(Key: line_num, value: line) words = split_words(line) for each w in words: for each neighbor x: emit((w, x)), 1) Here my key is not one, it’s pair. While going through

Counting Word Frequency (most significant words) in a String, excluding keywords

不问归期 提交于 2019-12-11 06:16:10
问题 I would like to count the frequency of words (excluding some keywords) in a string and sort them DESC. So, how can i do it? In the following string... This is stackoverflow. I repeat stackoverflow. Where the excluding keywords are ExKeywords() ={"i","is"} the output should be like stackoverflow repeat this P.S. NO! I am not re-designing google! :) 回答1: string input = "This is stackoverflow. I repeat stackoverflow."; string[] keywords = new[] {"i", "is"}; Regex regex = new Regex("\\w+");

Accessing a mapper's counter from a reducer in Hadoop MapReduce

限于喜欢 提交于 2019-12-10 23:09:57
问题 I need to access counters from mapper in reducer. I tried to perform this solution. My WordCount code is available below. import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Cluster; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import

EXCEL VBA: Counting word occurence while creating list of words

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-10 10:49:19
问题 I need to create a list of words used in all cells in column A, with a count of occurrence for each word on the list. So far, I've been able to create the list of words. (by searching the forum.) The list of words is generated in column B, can anyone help me with the code so it also generate the count of occurrence in column C? Thank you! Sub Sample() Dim varValues As Variant Dim strAllValues As String Dim i As Long Dim d As Object 'Create empty Dictionary Set d = CreateObject("Scripting

Word occurrence in a String(word count)

只愿长相守 提交于 2019-12-08 12:18:36
问题 Im stuck on writing Word occurrence in a string. I got some tip(in task notes) to use is compareToIgnoreCase. so I tried something like this: splitwords = StringCont.split("\\s"); for(int i=0; i<splitwords.length; i++) { if(splitwords[1].compareToIgnoreCase(splitwords[i]) == 0) splitcount++; } It is of course just what I can do and probably bad way. When I run the code, I get sometimes out of array exeption and sometimes it runs. What is missing is: go through all words and check them and

Wordcount C++ Hadoop pipes does not work

我们两清 提交于 2019-12-08 06:46:09
问题 I am trying to run the example of wordcount in C++ like this link describes the way to do : Running the WordCount program in C++. The compilation works fine, but when I tried to run my program, an error appeared : bin/hadoop pipes -conf ../dev/word.xml -input testtile.txt -output wordcount-out 11/06/06 14:23:40 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 11/06/06 14:23:40 INFO mapred.FileInputFormat: Total input

Unique word count in C++ help?

风流意气都作罢 提交于 2019-12-08 03:57:45
问题 I would like to do a function which can count the unique words. For example: "I like to program something useful. And I like to eat. Eat ice-cream now." In this case, each unique words: I occurs 2 like occurs 2 ... I will ignore the case later on. Please help EDIT: I have finished write the functions. It works perfectly. Thanks for all the help. Very much appreciated. 回答1: Sounds like you want to use an std::map with a key string and data of int. If an item doesn't exist in the map already