问题
I tried the python codes from the article of Rasha Ashraf "Scraping EDGAR with Python". He used urllib2 which is now invalid in python 3, I guess. Thus, I changed it into urllib.
I could bring the following Edgar web page. However, the number of word counting resulted in 0 no matter how I tried to fix the codes. Please help me to fix this problem. FYI, I manually check on the URL page so that "ADDRESS", "TYPE", and "transaction" occur 5 times, 9 times, and 49 times each. Nevertheless, my faulty python result shows 0 results for these three words.
Here are the python codes of Rasha Ashraf amended by me (only the urllib part and web URL). The original URL contains vast text content. So I changed it into a more simple page of the web.
import time
import csv
import sys
CIK = '0001018724'
Year= '2013'
string_match1= 'edgar/data/1018724/000112760220028651/0001127602-20-028651.txt'
url3= 'http://www.sec.gov/Archives/'+string_match1
import urllib.request
response3= urllib.request.urlopen(url3)
#output = response3.read()
#print(output)
words= ['ADDRESS','TYPE', 'transaction']
count= {}
for elem in words:
count[elem]= 0
for line in response3:
elements= line.split()
for word in words:
count[word]= count[word] + elements.count(word)
print (CIK)
print (Year)
print (url3)
print (count)
=> The result of my codes so far
0001018724
2013
http://www.sec.gov/Archives/edgar/data/1018724/000112760220028651/0001127602-20-028651.txt
{'ADDRESS': 0, 'TYPE': 0, 'transaction': 0}
回答1:
To get the correct count of the number of times each of your 3 strings (not words!) appear in the filing, try something like this:
import requests
url = "http://www.sec.gov/Archives/edgar/data/1018724/000112760220028651/0001127602-20-028651.txt"
req = requests.get(url)
words = ['address','type','transaction']
filing = req.text
for word in words:
print(word,': ',filing.lower().count(word))
Output:
address : 5
type : 9
transaction : 49
来源:https://stackoverflow.com/questions/64812162/word-count-from-web-text-document-result-in-0