【正则表达式】初识正则表达式

初识正则表达式

学Java介绍String类的时候说到过他的一个split()方法，老师说这个函数要传一个参数，返回一个根据参数把字符串分割成一个字符数组，其实这传入的就是一个正则表达式。

正则表达式就是用一种方式来描述一个字符串，然后用来进行匹配。比如说：我要找一个数，他可能在最前面有一个负号
-?
要表示：可能有一个负号，后面跟着一位或者多位数字：
-?\d
这里注意一下关于反斜杠 " \ "，反斜杠经常是作为转义字符的存在，这里也不例外，只不过这里匹配的时候得要俩，一个用来让Java语言认识，一个用来让正则表达式认识；

public class IntegerMatch {

	public static void main(String[] args) {
		System.out.println("-1234".matches("-?\\d+"));
		System.out.println("5678".matches("-?\\d+"));
		System.out.println("5678".matches("-\\d+"));
		System.out.println("+911".matches("-?\\d+"));
		System.out.println("+911".matches("(-|\\+)?\\d+"));
	}
}

output:
true
true
false
false
true

再看回split()方法，他的功能的解释应该是：“将字符串从正则表达式匹配的地方切开”

public class Splitting {
	public static String knights = "Then, when you have found the shrubbery, you must cut down the mightiest tree int the fores... with... a herring!";
	
	public static void split(String regex) {
		System.out.println(Arrays.toString(knights.split(regex)));
	}

	public static void main(String[] args) {
		split(" ");
		split("\\W+");
		split("n\\W+");
	}
}
output:
[Then,, when, you, have, found, the, shrubbery,, you, must, cut, down, the, mightiest, tree, int, the, fores..., with..., a, herring!]
[Then, when, you, have, found, the, shrubbery, you, must, cut, down, the, mightiest, tree, int, the, fores, with, a, herring]
[The, whe, you have found the shrubbery, you must cut dow, the mightiest tree int the fores... with... a herring!]

1、第一个就是普通的按照空格进行分割
2、第二个和第三个都用到了" \W "，它的意思是非单词字符，第二句里面将所有的标点都删除了
3、表示在字母n后面跟着的一个或者多个非单词字符
在这里插入图片描述

栗子：

public class Rudolph {

	public static void main(String[] args) {
		for(String pattern : new String[] {"Rudolph", "[rR]udolph", "[rR][aeiou][a-z]ol.*", "R.*"})
			System.out.println("Rudolph".matches(pattern));
	}
}
output:
true
true
true
true

这时候有一个匹配数量的问题，先放一个表格：
在这里插入图片描述
比如说我们想匹配1个或者多个abc序列，那么应该怎么写呢：(abc)+，如果写成了abc+，那么他匹配的就是ab后面跟1个或者多个c；

Pattern和Matcher
如果想构造功能强大的正则表达式对象，那么用Pattern.compile()方法来编译你的正则表达式，他会根据一个String类型的正则表达式编译出来一个Pattern对象，然后把想检索的字符串传入这个对象的matcher方法，就会生产一个Matcher对象，他就有了很多的功能：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestRegularExpression {

	public static void main(String[] args) {
		String[] arg = new String[] {"abcabcabcdefabc", "abc+", "(abc)+", "(abc){2,}"};
		if(arg.length < 2) {
			System.out.println("Usage:\njava TestRegularExpression characterSequence regularExpression+");
			System.exit(0);
		}
		System.out.println("Input: \"" + arg[0] +"\"");
		for(String str : arg) {
			System.out.println("Regular expression : \"" + str + "\"");
			Pattern p = Pattern.compile(str);
			Matcher m = p.matcher(arg[0]);
			while(m.find()) {
				System.out.println("Match \"" + m.group() +"\" at positions " + m.start() + "-" + (m.end() - 1));
			}
		}
	}
}
output:
Input: "abcabcabcdefabc"
Regular expression : "abcabcabcdefabc"
Match "abcabcabcdefabc" at positions 0-14
Regular expression : "abc+"
Match "abc" at positions 0-2
Match "abc" at positions 3-5
Match "abc" at positions 6-8
Match "abc" at positions 12-14
Regular expression : "(abc)+"
Match "abcabcabc" at positions 0-8
Match "abc" at positions 12-14
Regular expression : "(abc){2,}"
Match "abcabcabc" at positions 0-8

正则表达式在SQL语句中的应用

select prod_name
from products
where prod_name REGEXP '1000'
order by prod_name;

这个语句看起来非常像like的作用，REGEXP后面跟的正则表达式

select prod_name
from products
where prod_name REGEXP '.000'
order by prod_name;

这样前面是任意字符后面是000的就可以被匹配出来，比如1000和2000；

进行or匹配

select prod_name
from products
where prod_name REGEXP '1000|2000'
order by prod_name;

匹配几个字符之一

select prod_name
from products
where prod_name REGEXP '[123]Ton'
order by prod_name;

--output:
1 ton anvil
2 ton anvil

下面是一个跟他相似的：

select prod_name
from products
where prod_name REGEXP '1|2|3 Ton'
order by prod_name;

--output:
1 ton anvil
2 ton anvil
JetPack 1000
JetPack 2000
TNT (1 stick)

这次就不是我们期望的输出了，两次输出结果不一样。这是因为下面这个匹配的是 1 或者 2 或者 3 Ton。所以出现了这样的结果；除非[1|2|3]这样写，否则 " | "将会被应用到整个串中

字符集也可以被否定[123]表示的是1、2或3，[^123]表示的就是除了这些字符以外的任何东西

匹配范围

select prod_name
from products
where prod_name REGEXP '[1-5] Ton'
order by prod_name;

--output:
.5 ton anvil
1 ton anvil
2 ton anvil

因为5 ton匹配成功了，所以返回了.5 ton

匹配特殊字符

--转义字符
select vend_name
from vendors
where vend_name REGEXP '.'
order by vend_name

--output:
ACME
Anvils R us
Furball Inc.
Jet Set
Jouets Et Ours

因为.表示的是匹配任意字符，所以会全部都被查出来

select vend_name
from vendors
where vend_name REGEXP '\\.'
order by vend_name;

--output:
Furball Inc.

匹配多个实例
如果我想找一个单词，并且能够一个尾随的s
在这里插入图片描述

select prod_name
from products
where prod_name REGEXP'\\([0-9] sticks?\\)'
order by prod_name;

--output:
TNT (1 stick)
TNT (5 sticks)

’ \\( ’ ：匹配 ’ ）’
’ [0-9] ’ ：匹配 ’ 任意数字 ’
’ sticks? ’ ：匹配 ’ stick 和 sticks '（s是可选的，因为？表示匹配他前面的任何字符的0次或1次出现）

在这里插入图片描述

select prod_name
from products
where prod_name REGEXP '[[:digit:]]{4}'
order by prod_name;

--output:
JetPack 1000
JetPack 2000

[:digit]表示任意数字，{4}确切的要求了他前面的字符（任意数字）出现4次。所以就是匹配的任意的4位数字；

定位符
上面所有说的都是匹配一个串中任意位置的文本。如果想要找特殊位置的呢？
在这里插入图片描述
如果我想找一个数开始的产品，包括小数点开头的，那么简单的[0-9\.]和[[:digit:]\.]肯定不行了，因为他会是串中的任意位置匹配；

select prod_name
from products
where prod_name REGEXP '^[0-9\\.]'
order by prod_name;

--output:
.5 ton anvil
1 ton anvil
2 ton anvil

未完待续…

来源：CSDN

作者：鬼鬼@L

链接：https://blog.csdn.net/Mr_Ghost812/article/details/104153646

标签

正则表达式

反斜杠