uniq

Replacing an SQL query with unix sort, uniq and awk

非 Y 不嫁゛ 提交于 2020-01-14 03:57:22
问题 We currently have some data on an HDFS cluster on which we generate reports using Hive. The infrastructure is in the process of being decommissioned and we are left with the task of coming up with an alternative of generating the report on the data (which we imported as tab separated files into our new environment) Assuming we have a table with the following fields. Query IPAddress LocationCode Our original SQL query we used to run on Hive was (well not exactly.. but something similar) select

How to print only the unique lines in BASH?

给你一囗甜甜゛ 提交于 2020-01-11 04:26:42
问题 How can I print only those lines that appear exactly once in a file? E.g., given this file: mountain forest mountain eagle The output would be this, because the line mountain appears twice: forest eagle The lines can be sorted, if necessary. 回答1: Using awk: awk '{!seen[$0]++};END{for(i in seen) if(seen[i]==1)print i}' file eagle forest 回答2: Use sort and uniq : sort inputfile | uniq -u The -u option would cause uniq to print only unique lines. Quoting from man uniq : -u, --unique only print

Linux命令行里的“瑞士军刀”

拈花ヽ惹草 提交于 2020-01-08 21:11:30
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 这里说的“瑞士军刀”是指那些简单的一句命令就能完成其它高级语言一大片代码才能完成的工作。 下面的这些内容是 Quora 网站上Joshua Levy网友的总结: 通过sort/uniq获取文件内容的交集、合集和不同之处:假设有a、b两个文本文件,文件本身已经去除了重复内容。下面是效率最高的方法,可以处理任何体积的文件,甚至几个G的文件。(Sort对内存没有要求,但也许你需要用 -T 参数。)可以试着比较一下,你可以看看如果用Java来处理磁盘上文件的合并,需要用多少行代码。 cat a b | sort | uniq > c # c 是a和b的合集 cat a b | sort | uniq -d > c # c 是a和b的交集 cat a b b | sort | uniq -u > c # c 是a和b的不同 汇总一个文本内容里第三列数字的和(这个方法要比用Python来做快3倍并只需1/3的代码量): awk ‘{ x += $3 } END { print x }’ myfile 如果你想查看一个目录树里的文件的体积和修改日期,用下面的方法,相当于你挨个目录做”ls -l”,而且输出的形式比你用”ls -lR”更可读: find . -type f -ls 使用xargs命令。这个命令非常的强大

lodash uniq - choose which duplicate object to keep in array of objects

纵然是瞬间 提交于 2020-01-04 02:26:08
问题 is there any way to specify which array item to keep based on a key being non-empty. it seems uniq just keeps the first occurrence. e.g: var fruits = [ {'fruit': 'apples', 'location': '', 'quality': 'bad'}, {'fruit': 'apples', 'location': 'kitchen', 'quality': 'good'}, {'fruit': 'pears', 'location': 'kitchen', 'quality': 'excellent'}, {'fruit': 'oranges', 'location': 'kitchen', 'quality': ''} ]; console.log(_.uniq(fruits, 'fruit')); /* output is: Object { fruit="apples", quality="bad",

how -f , -s options work with the uniq command?

≯℡__Kan透↙ 提交于 2019-12-25 17:42:50
问题 According to manual page for uniq the -f option is for skipping fields the -s option for skipping characters Can someone explain with relevant examples, how actually these two options work? 回答1: Vanilla uniq : /tmp$ cat > foo foo foo bar bar bar baz baz /tmp$ uniq foo foo bar baz uniq -s to skip over the first character: /tmp$ cat > bar 1foo 2foo 3bar 4bar 5bar 6baz 7baz /tmp$ uniq -s1 bar 1foo 3bar 6baz uniq -f to skip over the first field of the input (here, hosts): /tmp$ cat > baz 127.0.0

sort 、 uniq 命令

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-25 13:05:44
Linux uniq 命令用于检查及删除文本文件中重复出现的行列,一般与 sort 命令结合使用。 uniq 可检查文本文件中重复出现的行列。 语法 uniq [-cdu][-f<栏位>][-s<字符位置>][-w<字符位置>][--help][--version][输入文件][输出文件] 参数 : -c或--count 在每列旁边显示该行重复出现的次数。 -d或--repeated 仅显示重复出现的行列。 -f<栏位>或--skip-fields=<栏位> 忽略比较指定的栏位。 -s<字符位置>或--skip-chars=<字符位置> 忽略比较指定的字符。 -u或--unique 仅显示出一次的行列。 -w<字符位置>或--check-chars=<字符位置> 指定要比较的字符。 --help 显示帮助。 --version 显示版本信息。 [输入文件] 指定已排序好的文本文件。如果不指定此项,则从标准读取数据; [输出文件] 指定输出的文件。如果不指定此选项,则将内容显示到标准输出设备(显示终端)。 实例 文件testfile中第 2、3、5、6、7、9行为相同的行,使用 uniq 命令删除重复的行,可使用以下命令: uniq testfile testfile中的原有内容为: $ cat testfile #原有内容 test 30 test 30 test 30 Hello

How to find single entries in a txt file?

人盡茶涼 提交于 2019-12-24 11:56:07
问题 I have a txt file with 12 columns. Some lines are duplicated and some are not. As an example i copied to first 4 columns of my data. 0 0 chr12 48548073 0 0 chr13 80612840 2 0 chrX 4000600 2 0 chrX 31882528 3 0 chrX 3468481 4 0 chrX 31882726 4 0 chr3 75007624 Based on the first column, you can see that some there are duplicates except entry '3'. I would like to print the only single entries, in this case '3'. The output will be 3 0 chrX 3468481 IS there a quick way of doing this with awk or

How get unique lines from a very large file in linux?

我的梦境 提交于 2019-12-24 00:36:49
问题 I have a very large data file (255G; 3,192,563,934 lines). Unfortunately I only have 204G of free space on the device (and no other devices I can use). I did a random sample and found that in a given, say, 100K lines, there are about 10K unique lines... but the file isn't sorted. Normally I would use, say: pv myfile.data | sort | uniq > myfile.data.uniq and just let it run for a day or so. That won't work in this case because I don't have enough space left on the device for the temporary

sort: string comparison failed Invalid or incomplete multibyte or wide character

荒凉一梦 提交于 2019-12-23 09:03:06
问题 I'm trying to use the following command on a text file: $ sort <m.txt | uniq -c | sort -nr >m.dict However I get the following error message: sort: string comparison failed: Invalid or incomplete multibyte or wide character sort: Set LC_ALL='C' to work around the problem. sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’. I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. Please see: Using AWK to place each word in

linux 下查看机器是cpu是几核的

做~自己de王妃 提交于 2019-12-22 09:21:40
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 几个cpu more /proc/cpuinfo |grep "physical id"|uniq|wc -l 每个cpu是几核(假设cpu配置相同) more /proc/cpuinfo |grep "physical id"|grep "0"|wc -l cat /proc/cpuinfo | grep processor 1. 查看物理CPU的个数 #cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l 2. 查看逻辑CPU的个数 #cat /proc/cpuinfo |grep "processor"|wc -l 3. 查看CPU是几核 #cat /proc/cpuinfo |grep "cores"|uniq 4. 查看CPU的主频 #cat /proc/cpuinfo |grep MHz|uniq # uname -a Linux euis1 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 i686 i386 GNU/Linux (查看当前操作系统内核信息) # cat /etc/issue | grep Linux Red Hat Enterprise Linux AS