正则式表达与文件格式化处理

对着背影说爱祢 提交于 2020-02-15 08:50:36

1.什么是正则式表达

正则式表达就是处理字符的方法,是以行为单位来进行字符串的处理行为。对于管理员来说,正则式表达是不得不学的好东西。这里需要记住的是,正则式表达和通配符没有任何的关系。

2.grep的一些高级参数

2.1.基础正则式表达练习

首先在鸟哥的网站上进行数据的下载。

zhangsan@Aliyun:~$ wget http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt
zhangsan@Aliyun:~$ vim regular_express.txt
  1 "Open Source" is a good mechanism to develop programs.
  2 apple is my favorite food.
  3 Football game is not use feet only.
  4 this dress doesn't fit me.
  5 However, this dress is about $ 3183 dollars.^M
  6 GNU is free air not free beer.^M
  7 Her hair is very beauty.^M
  8 I can't finish the test.^M
  9 Oh! The soup taste good.^M
 10 motorcycle is cheap than car.
 11 This window is clear.
 12 the symbol '*' is represented as start.
 13 Oh!     My god!
 14 The gd software is a library for drafting programs.^M
 15 You are the best is mean you are the no. 1.
 16 The world <Happy> is the same with "glad".
 17 I like dog.
 18 google is the best tools for search keyword.
 19 goooooogle yes!
 20 go! go! Let's go.
 21 # I am VBird
 22 

上面的内容就是鸟哥连接的内容了。
首先要进行特定的字符串的查找,如下所示。

zhangsan@Aliyun:~$ grep -n 'the' regular_express.txt 
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

其次,我们利用中括号进行查找

zhangsan@Aliyun:~$ grep -n 't[ea]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.

通过这个例子,可得知[]的作用就是替换[ea]表示e和a都可以。
行首^与行尾$
此外,我们也可以通过行首字符^与行尾字符$去掉一些特殊的字符

# ^ 代表的是以tom为开头的字条
zhangsan@Aliyun:~$ grep '^tom' passwd 
tom sys:x:3:3:sys:/dev:/usr/sbin/nologin
# $ 代表的是以tom为结尾的字条
zhangsan@Aliyun:~$ grep 'tom$' passwd 
games:x:5:60:games:/usr/games:/usr/sbin/nologintom

如果我想要找出哪一行是空白行,我的操作就是:

zhangsan@Aliyun:~$ grep -n '^$' regular_express.txt
22:

任意字符.与重复字符*

zhangsan@Aliyun:~$ grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.
16:The world <Happy> is the same with "glad".

限定连续RE字符{}

# 查找tm之间有四个o的字符
zhangsan@Aliyun:~$ grep 'to\{4\}m' passwd 
toooom sys:x:3:3:sys:/dev:/usr/sbin/nologin
# 查找tm之间有4~9个o的字符
zhangsan@Aliyun:~$ grep 'to\{4,9\}m' passwd 
toooom sys:x:3:3:sys:/dev:/usr/sbin/nologin
tooooooooomsync:x:4:65534:sync:/bin:/bin/sync
tooooomman:x:6:12:man:/var/cache/man:/usr/sbin/nologin

3.sed工具

sed命令能够将满足要求的字符找出来并进行修改。

zhangsan@Aliyun:~$ sed '1,2d' hosts
# 删除hosts 文件的第一行与第二行
# sed把文件放到内存,然后修改,只在内存修改
zhangsan@Aliyun:~$ sed -i '1,2d' hosts
# 直接修改到文件中

4.awk好用的数据处理工具

awk命令能够帮助我们找到感兴趣的内容然后排版,其操作的格式为:awk -F ‘条件{操作}’ 文件

zhangsan@Aliyun:~$ ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 162.17.172.171  netmask 255.255.240.0  broadcast 162.16.165.255
        ether 00:16:3e:10:b4:c7  txqueuelen 1000  (Ethernet)
        RX packets 41449753  bytes 9972007192 (9.9 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 41068356  bytes 6352574318 (6.3 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
# 抓取 162.17.172.171  这一段
zhangsan@Aliyun:~$ ifconfig eth0 | grep 'inet' | awk -F" " '{print $2}'
162.17.172.171
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!