findall

Python regular expression findall *

两盒软妹~` 提交于 2019-12-02 13:24:07
I am not able to understand the following code behavior. >>> import re >>> text = 'been' >>> r = re.compile(r'b(e)*') >>> r.search(text).group() 'bee' #makes sense >>> r.findall(text) ['e'] #makes no sense I read some already existing question and answers about capturing groups and all. But still I am confused. Could someone please explain me. The answer is simplified in the Regex Howto As you can read here , group returns the string matched by the Regular Expression. group() returns the substring that was matched by the RE. But the action of findall is justified in the documentation If one or

Python BeautifulSoup Getting a column from table - IndexError List index out of range

限于喜欢 提交于 2019-12-02 07:00:27
问题 Python newbie here. Python 2.7 with beautifulsoup 4. I am trying to get parse a webpage to get columns using BeautifulSoup. The webpage has tables inside tables; but table 4 is the one that I want, it does not have any headers or th tag. I want to get the data into column. from bs4 import BeautifulSoup import urllib2 url = 'http://finance.yahoo.com/q/op?s=aapl+Options' htmltext = urllib2.urlopen(url).read() soup = BeautifulSoup(htmltext) #Table 8 has the data needed; it is nested under other

beautifulsoup find_all bug?

a 夏天 提交于 2019-12-02 04:16:39
Nowadays I am using beautiful soup to parse the html page. But sometimes the result I got by find_all is less than the number in pages. For example, this page http://www.totallyfreestuff.com/index.asp?m=0&sb=1&p=5 has 18 headline span. But when i use the following codes, it just got two! Can anybody tell me why. Thank you in advance! soup = BeautifulSoup(page, 'html.parser') hrefDivList = soup.find_all("span", class_ = "headline") #print hrefDivList print len(hrefDivList) You can try using different parser for Beautifulsoup. import requests from bs4 import BeautifulSoup url = "<your url>" r =

Python BeautifulSoup Getting a column from table - IndexError List index out of range

烈酒焚心 提交于 2019-12-02 03:33:11
Python newbie here. Python 2.7 with beautifulsoup 4. I am trying to get parse a webpage to get columns using BeautifulSoup. The webpage has tables inside tables; but table 4 is the one that I want, it does not have any headers or th tag. I want to get the data into column. from bs4 import BeautifulSoup import urllib2 url = 'http://finance.yahoo.com/q/op?s=aapl+Options' htmltext = urllib2.urlopen(url).read() soup = BeautifulSoup(htmltext) #Table 8 has the data needed; it is nested under other tables though # specific reference works as below: print soup.findAll('table')[8].findAll('tr')[2]

Why is re.findall not being specific in finding triplet items in string. Python

[亡魂溺海] 提交于 2019-12-01 14:29:20
So I have four lines of code seq= 'ATGGAAGTTGGATGAAAGTGGAGGTAAAGAGAAGACGTTTGA' OR_0 = re.findall(r'ATG(?:...){9,}?(?:TAA|TAG|TGA)',seq) Let me explain what I am attempting to do first . . . I'm sorry if this confusing but I am going to try my best to explain it. So I'm looking for sequences that START with 'ATG' followed by units of 3 of any word char [e.g. 'GGG','GTT','TTA',etc] until it encounters either an 'TAA','TAG' or 'TGA' I also want them to be at least 30 characters long. . . hence the {9,}? This works to some degree but if you notice in seq that there is ATG GAA GTT GGA TGA AAG TGG

Why is re.findall not being specific in finding triplet items in string. Python

一个人想着一个人 提交于 2019-12-01 12:58:11
问题 So I have four lines of code seq= 'ATGGAAGTTGGATGAAAGTGGAGGTAAAGAGAAGACGTTTGA' OR_0 = re.findall(r'ATG(?:...){9,}?(?:TAA|TAG|TGA)',seq) Let me explain what I am attempting to do first . . . I'm sorry if this confusing but I am going to try my best to explain it. So I'm looking for sequences that START with 'ATG' followed by units of 3 of any word char [e.g. 'GGG','GTT','TTA',etc] until it encounters either an 'TAA','TAG' or 'TGA' I also want them to be at least 30 characters long. . . hence

extbase repository findAll() returns result null

折月煮酒 提交于 2019-11-30 18:54:50
I have several Controllers like those: CategoryController and NewsController As well as the domain models for category and news and reposirtories for both. In the NewsController I do a dependencyInjection like this (the same way as in categoryController): /** * categoryRepository * * @var Tx_MyExtension_Domain_Repository_CategoryRepository */ protected $categoryRepository; /** * injectCategoryRepository * * @param Tx_MyExtension_Domain_Repository_CategoryRepository $CategoryRepository * @return void */ public function injectCategoryRepository(Tx_MyExtension_Domain_Repository_CategoryRepository

extbase repository findAll() returns result null

杀马特。学长 韩版系。学妹 提交于 2019-11-30 03:35:51
问题 I have several Controllers like those: CategoryController and NewsController As well as the domain models for category and news and reposirtories for both. In the NewsController I do a dependencyInjection like this (the same way as in categoryController): /** * categoryRepository * * @var Tx_MyExtension_Domain_Repository_CategoryRepository */ protected $categoryRepository; /** * injectCategoryRepository * * @param Tx_MyExtension_Domain_Repository_CategoryRepository $CategoryRepository *

Word boundary with regex - cannot extract all words

耗尽温柔 提交于 2019-11-29 16:03:21
I need extract double Male-Cat : a = "Male-Cat Male-Cat Male-Cat-Female" b = re.findall(r'(?:\s|^)Male-Cat(?:\s|$)', a) print (b) ['Male-Cat '] c = re.findall(r'\bMale-Cat\b', a) print (c) ['Male-Cat', 'Male-Cat', 'Male-Cat'] I need extract tree times Male-Cat : a = "Male-Cat Male-Cat Male-Cat" b = re.findall(r'(?:\s|^)Male-Cat(?:\s|$)', a) print (b) ['Male-Cat ', ' Male-Cat'] c = re.findall(r'\bMale-Cat\b', a) print (c) ['Male-Cat', 'Male-Cat', 'Male-Cat'] Another strings which are parsed correctly by first way: a = 'Male-Cat Female-Cat Male-Cat-Female Male-Cat' a = 'Male-Cat-Female' a =

C# FindAll VS Where Speed

好久不见. 提交于 2019-11-29 00:58:39
Anyone know any speed differences between Where and FindAll on List. I know Where is part of IEnumerable and FindAll is part of List, I'm just curious what's faster. The FindAll method of the List<T> class actually constructs a new list object, and adds results to it. The Where extension method for IEnumerable<T> will simply iterate over an existing list and yield an enumeration of the matching results without creating or adding anything (other than the enumerator itself.) Given a small set, the two would likely perform comparably. However, given a larger set, Where should outperform FindAll,