dfa | 易学教程

编译原理DFA（有限确定自动机）的构造

阅读更多关于编译原理DFA（有限确定自动机）的构造

CODE： https://github.com/pxjw/Principles-of-Compiler/tree/master/consDFA 原题： 1、自己定义一个简单语言或者一个右线性正规文法示例如( 仅供参考 ) G[S]：S→aU|bV U→bV|aQ V→aU|bQ Q→aQ|bQ|e 2、构造其有穷确定自动机，如 3、利用有穷确定自动机M=(K,Σ,f, S,Z）行为模拟程序算法，来对于任意给定的串，若属于该语言时，该过程经有限次计算后就会停止并回答“是”，若不属于，要么能停止并回答“不是” K:=S； c:=getchar; while c<>eof do {K:=f(K,c); c:=getchar; }; if K is in Z then return (‘yes’) else return (‘no’) 开始编程！ 1.状态转换式构造类： current——当前状态 next——下一状态 class TransTile { public: char current; char next; char input; TransTile(char C,char I,char Ne){ current = C; next = Ne; input = I; } }; 2.DFA的构造类此处包括DFA的数据集，字母表，以及过程P的定义。包括了初始化，遍历转换

C# 词法分析器（五）转换 DFA

阅读更多关于 C# 词法分析器（五）转换 DFA

系列导航（一）词法分析介绍（二）输入缓冲和代码定位（三）正则表达式（四）构造 NFA （五）转换 DFA （六）构造词法分析器（七）总结在上一篇文章中，已经得到了与正则表达式等价的 NFA，本篇文章会说明如何从 NFA 转换为 DFA，以及对 DFA 和字符类进行化简。一、DFA 的表示 DFA 的表示与 NFA 比较类似，不过要简单的多，只需要一个添加新状态的方法即可。 Dfa 类的代码如下所示： namespace Cyjb.Compilers.Lexers { class Dfa : IList<DfaState> { // 在当前 DFA 中创建一个新状态。 DfaState NewState() {} } } DFA 的状态也比较简单，必要的属性只有两个：符号索引和状态转移。符号索引表示当前的接受状态对应的是哪个正则表达式。不过 DFA 的一个状态可能对应于 NFA 的多个状态（详见下面的子集构造法），所以 DFA 状态的符号索引是一个数组。对于普通状态，符号索引是空数组。状态转移表示如何从当前状态转移到下一状态，由于在构造 NFA 时已经划分好了字符类，所以在 DFA 中直接使用数组记录下不同字符类对应的转移（DFA 中是不存在 $\epsilon$ 转移的，而且对每个字符类有且只有一条转移）。在 NFA 的状态定义中，还有一个状态类型属性，但是在

正规式->最小化DFA说明

阅读更多关于正规式->最小化DFA说明

整体的步骤是三步：一，先把正规式转换为NFA（非确定有穷自动机）, 二，在把NFA通过“子集构造法”转化为DFA，三，在把DFA通过“分割法”进行最小化。一步很简单，就是反复运用下图的规则，图1 这样就能转换到NFA了。给出一个例题，来自Google book。本文主要根据这个例题来讲，图2 二.子集构造法。同样的例题，把转换好的NFA确定化，图3 这个表是从NFA到DFA的时候必须要用到的。第一列第一行I的意思是从NFA的起始节点经过任意个ε所能到达的结点集合。Ia表示从该集合开始经过一个a所能到达的集合，经过一个a的意思是可以略过前后的ε。同样Ib也就是经过一个b，可以略过前后任意个ε。至于第二行以及后面的I是怎么确定的。我参考了一些题目才明白，原来就是看上面的Ia和Ib哪个还没出现在I列，就拿下来进行运算，该列对应的Ia和Ib就是前面我说的那样推导。如果还不太明白，看图就是了。你会发现I中的几个项目都在Ia和Ib中出现了。而且是完全出现这步做完以后，为了画出最后的DFA，那么肯定得标出一些号来，比如1.2.3.。或者A。 B。c，我一般标的方法是先把I列全部标上1.2.3.递增。然后看1表示的集合和Ia和Ib中的哪个集合一样，就把那个集合也表示为1.继续向下做。最后会得到这样一个表格。图4 至此，就可以表示出DFA了。就对照上面那个表，从0节点开始经过a到1

C# 词法分析器（三）正则表达式

阅读更多关于 C# 词法分析器（三）正则表达式

系列导航（一）词法分析介绍（二）输入缓冲和代码定位（三）正则表达式（四）构造 NFA （五）转换 DFA （六）构造词法分析器（七）总结正则表达式是一种描述词素的重要表示方法。虽然正则表达式并不能表达出所有可能的模式（例如“由等数量的 a 和 b 组成的字符串”），但是它可以非常高效的描述处理词法单元时要用到的模式类型。一、正则表达式的定义正则表达式可以由较小的正则表达式按照规则递归地构建。每个正则表达式 $r$ 表示一个语言 $L(r)$，而语言可以认为是一个字符串的集合。正则表达式有以下两个基本要素： $\epsilon$ 是一个正则表达式， $L( \epsilon ) = { \epsilon }$，即该语言只包含空串（长度为 0 的字符串）。如果 $a$ 是一个字符，那么 $\bf{ a }$ 是一个正则表达式，并且 $L( \bf{a} ) = \{ a \}$，即该语言只包含一个长度为 $1$ 的字符串 $a$。由小的正则表达式构造较大的正则表达式的步骤有以下四个部分。假定 $r$ 和 $s$ 都是正则表达式，分别表示语言 $L(r)$ 和 $L(s)$，那么： $(r)|(s)$ 是一个正则表达式，表示语言 $L(r) \cup L(s)$，即属于 $L(r)$ 的字符串和属于 $L(s)$ 的字符串的集合（ $L(r) \cup L(s) =

Why does Regexp have a timeout method, while in theory they shouldn't?

阅读更多关于 Why does Regexp have a timeout method, while in theory they shouldn't?

问题 This is a theoretical Computer Science question (Computation Theory). I know that RegExps can take a very long time to calculate. However, from Theory of Computation we know that matching with a Regular Expression can be done extremely fast in a few clock cycles. If RegExps are equivalent to Finite Automata, why RegExps have (or require) a timeout method? Using a DFA, the computation time for matching can be exteremely fast. By RegExps I mean the Regular Expressions pattern matching classes

DFA based regular expression matching - how to get all matches?

阅读更多关于 DFA based regular expression matching - how to get all matches?

问题 I have a given DFA that represent a regular expression. I want to match the DFA against an input stream and get all possible matches back, not only the leastmost-longest match. For example: regex: a*ba|baa input: aaaaabaaababbabbbaa result: aaaaaba aaba ba baa 回答1: Assumptions Based on your question and later comments you want a general method for splitting a sentence into non-overlapping, matching substrings, with non-matching parts of the sentence discarded. You also seem to want optimal

DFA based regular expression matching - how to get all matches?

阅读更多关于 DFA based regular expression matching - how to get all matches?

Efficient matching of text messages against thousands of regular expressions

阅读更多关于 Efficient matching of text messages against thousands of regular expressions

问题 I am solving a problem where I have text message to match with thousands of regular expressions of the form <some string> {0 or 300 chars} <some string> {0 or 300 chars} e.g. "on"[ \t\r]*(.){0,300}"."[ \t\r]*(.){0,300}"from" or a real example can be "Dear"[ \t\r]*"Customer,"[ \t\r]*"Your"[ \t\r]*"package"[ \t\r]*(.){0,80}[ \t\r]*"is"[ \t\r]*"out"[ \t\r]*"for"[ \t\r]*"delivery"[ \t\r]*"via"(.){0,80}[ \t\r]*"Courier,"[ \t\r]*(.){0,80}[ \t\r]*"on"(.){0,80}"."[ \t\r]*"Delivery"[ \t\r]*"will"[ \t

Regular Expression for Binary Numbers Divisible by 3

阅读更多关于 Regular Expression for Binary Numbers Divisible by 3

问题 I am self-studying regular expressions and found an interesting practice problem online that involves writing a regular expression to recognize all binary numbers divisible by 3 (and only such numbers). To be honest, the problem asked to construct a DFA for such a scenario, but I figured that it should be equivalently possible using regular expressions. I know that there's a little rule in place to figure out if a binary number is divisible by 3: take the number of ones in even places in the

DFAs vs Regexes when implementing a lexical analyzer?

阅读更多关于 DFAs vs Regexes when implementing a lexical analyzer?

问题 (I'm just learning how to write a compiler, so please correct me if I make any incorrect claims) Why would anyone still implement DFAs in code (goto statements, table-driven implementations) when they can simply use regular expressions? As far as I understand, lexical analyzers take in a string of characters and churn out a list of tokens which, in the languages' grammar definition, are terminals, making it possible for them to be described by a regular expression. Wouldn't it be easier to

订阅 dfa