RE

How to use regex re.compile Match() or findall() in list comprehension

自闭症网瘾萝莉.ら 提交于 2020-07-09 12:04:17
问题 I am trying to use regex in list comprehension without needing to use the pandas extract() functions. I want to use regex because my code might need to change where I need to use more complex pattern matching. A kind user here suggested I use the str accessor functions but again it mainly works because the current pattern is simple enough. As of now, I need to return pandas rows that either contain nan or whose values under ODFS_FILE_CREATE_DATETIME are not 10 string numbers i.e.: does not

Regex: delete between brackets, but only if below character length

谁说胖子不能爱 提交于 2020-07-03 10:05:14
问题 I have strings such as: this is a text ( with parts in brackets ) . This is another string ( with a very long string between brackets that should not be removed because it is too long being over 100 characters ) Desired output: this is a text . This is another string ( with a very long string between brackets that should not be removed because it is too long being over 100 characters ) I can match the bracket content with (with the goal to replace it with an empty string to remove it). \s\(.+

What are all the ILLEGAL_CHARACTERS from openpyxl?

爱⌒轻易说出口 提交于 2020-06-29 04:07:42
问题 We are running into a problem when parsing emails with python from outlook. Sometimes emails have characters that are not able to be appended to an excel worksheet using openpyxl. The error it raises is just IllegalCharacterError . I am trying to force this to print out the actual characters that are considered "Illegal". That said while doing some digging in one of the files in opnepyxl I found on cell.py this line that raises the error. if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):

Unable to follow a user in a website using some script built upon requests

只谈情不闲聊 提交于 2020-06-29 03:40:18
问题 The bounty expires in 7 days . Answers to this question are eligible for a +100 reputation bounty. MITHU wants to draw more attention to this question. I'm trying to follow a user in Instagram using the script below built upon requests. The script can log me in successfully but it can't help me follow that user. I tried my best to mimic the process through the script what I could see in dev tools while following that user manually. User that I wish to follow using the script. This is how the

Split a string and keep the delimiters as part of the split string chunks, not as separate list elements

夙愿已清 提交于 2020-06-28 09:21:22
问题 This is a spin-off from In Python, how do I split a string and keep the separators? rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00' How can I split this rawByteString into parts using "\\!" as the delimiter without dropping the delimiters, so that I get: [b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00'] I do not want to use [b'\\!' + x for x in rawByteString.split(b'\\!')][1:] as that would use string.split() and is just a workaround, that is why this

Compare two Dataframes of sentences and return a third one

Deadly 提交于 2020-04-30 06:34:02
问题 I want to compare two long Dataframe columns of sentences, and return a third dataframe that looks like this. a snapshot is shown below. My first approach was long winded and only worked for single instances, but failed when i applied it to the dataframe. It can be found in a previous question. The logic is for words in c1 and c2, new value =1, for word in only c1, value set to zero. sentences = tra_df['Sent1'] context = tra_df['Sent2'] Sent1[0] = "I am completely happy with the plan you have

Compare two Dataframes of sentences and return a third one

百般思念 提交于 2020-04-30 06:33:06
问题 I want to compare two long Dataframe columns of sentences, and return a third dataframe that looks like this. a snapshot is shown below. My first approach was long winded and only worked for single instances, but failed when i applied it to the dataframe. It can be found in a previous question. The logic is for words in c1 and c2, new value =1, for word in only c1, value set to zero. sentences = tra_df['Sent1'] context = tra_df['Sent2'] Sent1[0] = "I am completely happy with the plan you have

python re 模块学习笔记

好久不见. 提交于 2020-03-02 02:52:32
一.常用方法 re.compile(pattern,flags=0) 编译正则表达式,返回regexobject对象,然后可以通过regexobject对象调用match()和search()方法. 可以实现正则代码的重用. re.serch(pattern,string,flags=0) 在正则表达式中查找,是否匹配正则表达式.返回_sre.SRE_Match对象,如果不能匹配返回None. re.match(pattern,string,flags=0) 字符串的开头是否能匹配正则表达式.返回_sre.SRE_Match对象,如果不能匹配返回None. re.split(pattern,string,maxsplit=0) 通过正则表达式将字符串分离.如果用括号将正则表达式阔起来,那么匹配的字符串也会被列入list中返回.maxsplit是分离的次数,maxsplit=1分离一次,默认为0,不限制. 如果字符串的开始或结尾就能匹配,返回list将会以空串开始或结尾. 如果字符串不能匹配,将会返回整个字符串的list. re.finditer(pattern,string,flags=0) 找到 RE 匹配的所有子串,并把它们作为一个迭代器返回. 这个匹配是从左到右有序地返回. 如果无匹配,返回空列表. re.sub(pattern,real,string,count=0

Python下的正则表达式原理和优化笔记

老子叫甜甜 提交于 2020-02-29 14:32:09
最近的时间内对正则表达式进行了一点点学习。所选教材是《mastering regular expressions》,也就是所谓的《精通正则表达式》。读过一遍后,顿感正则表达式的强大和精湛之处。其中前三章是对正则表达式的基本规则的介绍和铺垫。七章以后是对在具体语言下的应用。而核心的部分则是四五六这三章节。 其中第四章是讲了整个正则表达式的精华,即传统引擎NFA的回溯思想。第五章是一些例子下对回溯思想的理解。第六章则是对效率上的研究。根源也是在回溯思想上的引申和研究。 这篇文章是我结合python官方re模块的文档以及这本书做一个相应的总结。 其中官方的文档: http://docs.python.org/3.3/library/re.html 由于我都是在python上联系和使用的,所以后面的问题基本都是在python上提出来的,所以这本书中的其它正则流派我均不涉及。依书中,python和perl风格差不多,属于传统NFA引擎,也就是以“表达式主导“,采用回溯机制,匹配到即停止( 顺序敏感 ,不同于POSIX NFA等采用匹配最左最长的结果)。 对于回溯部分,以及谈及匹配的时候,将引擎的位置总是放在字符和字符之间,而不是字符本身。比如^对应的是第一个字符之前的那个”空白“位置。 基础规则的介绍 python中的转义符号 干扰 python中,命令行和脚本等,里面都会对转义符号做处理