Scala 之WordCount 样例

object WCDemo {   def main(args: Array[String]): Unit = {     val lineList = List("hello tom hello jerry", "hello jerry", "hello kitty")     val wordList = lineList.map(_.split(" ")) //wordList: List[Array[String]] = List(Array(hello, tom, hello, jerry), Array(hello, jerry), Array(hello, kitty))     val wordList1 = wordList.flatten  // wordList1: List[String] = List(hello, tom, hello, jerry, hello, jerry, hello, kitty)     val wordMap = wordList1.map(_ -> 1) // wordMap: List[(String, Int)] = List((hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (jerry,1), (hello,1), (kitty,1))     val wordMap1 = wordMap.groupBy(_._1) // wordCount: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(tom -> List((tom,1)), kitty -> List((kitty,1)), jerry -> List((jerry,1), (jerry,1)), hello -> List((hello,1), (hello,1), (hello,1), (hello,1)))     val result = wordMap1.map(t => (t._1,t._2.size)) // result: scala.collection.immutable.Map[String,Int] = Map(tom -> 1, kitty -> 1, jerry -> 2, hello -> 4)     /**       * 说明：       * 1、lineList.map(_.split(" "))       *    map:通过一个函数重新计算列表中所有元素，并且返回一个相同数目元素的新列表。       *    如上将每个元素取出再通过空格进行分割，从而返回相同数目过犹不及的新列表       *       * 2、wordList.flatten       *    flatten:把嵌套的结构展开，或者说flatten可以把一个二维的列表展开成一个一维的列表。       *    如上将List中的Array打开形成一维的新列表，如果是Set则会根据Set的性质将重复的删除掉       *    val ys = Set(            List(1, 2, 3),            List(3, 2, 1)          ).flatten           // ys == Set(1, 2, 3)            val xs = List(            Set(1, 2, 3),            Set(1, 2, 3)          ).flatten          // xs == List(1, 2, 3, 1, 2, 3)       *       * 3、wordList1.map(_ -> 1)       *  这一步与第一步同理，需要注意的是 ->  这个符号是将每个元素转换成元组（xx,1）的格式以便后面进行分组       *  其实这一步可以与第二结合起来用flatMap方法便可。       *  flatMap：flatMap结合了map和flatten的功能，接收一个可以处理嵌套列表的函数，然后把返回结果连接起来。       *       * 4、wordMap.groupBy(_._1)       *  groupBy：是对集合中的元素进行分组操作，结果得到的是一个Map。       *  如上按元组中第一个值即单词来分组，得到Map       *       * 5、wordCount.map(t => (t._1,t._2.size))       *  这一步可根据第4步中的Map可知，只要确定Map中每个key对应的values中List的大小便可计算出词频了。       *  因为map(_._1,_._2.size)中的参数并不支持直接传两个“_”参数调用，所以可直接传入一个匿名函数来获取所需要的两个值       */     println("result : " + result)     // 写成一行如下     val result2 = lineList.flatMap(_.split(" ")).map(_ -> 1).groupBy(_._1).map(t => (t._1, t._2.size))     println("wc result2 : " + result2)   }  }
文章来源: Scala 之WordCount 样例
标签
scala
list
jerry
string
array