parsing | 易学教程

Writing an HTML Parser

阅读更多关于 Writing an HTML Parser

问题 I am currently attempting (or planning to attempt) to write a simple (as possible) program to parse an html document into a tree. After googling I have found many answers saying "don't do it it's been done" (or words to that effect); and references to examples of HTML parsers; and also a rather emphatic article on why one shouldn't use Regular expresions. However I haven't found any guides on the "right" way to write a parser. (This, by the way, is something I'm attempting more as a learning

json file is missing/ struct is wrong

阅读更多关于 json file is missing/ struct is wrong

问题 I have been trying to get this code to work for like 6 hours. I get the error: "failed to convert The data couldn’t be read because it is missing." I don't know while the File is missing is there something wrong in my models(structs). Do I need to write a struct for very json dictionary? Currently I have only made those JSON dictionaries to a struct, which I actually need. The full JSON file can be found at https://api.met.no/weatherapi/sunrise/2.0/.json?lat=40.7127&lon=-74.0059&date=2020-12

json file is missing/ struct is wrong

阅读更多关于 json file is missing/ struct is wrong

Replacing a custom “HTML” tag in a Python string

阅读更多关于 Replacing a custom “HTML” tag in a Python string

问题 I want to be able to include a custom "HTML" tag in a string, such as: "This is a <photo id="4" /> string" . In this case the custom tag is <photo id="4" /> . I would also be fine changing this custom tag to be written differently if it makes it easier, ie [photo id:4] or something. I want to be able to pass this string to a function that will extract the tag <photo id="4" /> , and allow me to transform this to some more complicated template like <div class="photo"><img src="...." alt="..."><

EDGAR SEC 10-K Individual Sections Parser

阅读更多关于 EDGAR SEC 10-K Individual Sections Parser

问题 Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings? I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into

Error: JSON.parse: unexpected non-whitespace character after JSON data

阅读更多关于 Error: JSON.parse: unexpected non-whitespace character after JSON data

问题 I have a problem with Json pars, I have seen tons of users had this problem I saw them all but I couldn't understand where my error is in my code! Sorry if this is duplicate! First file index.html: This is in the head section of the file: <script type="text/javascript"> function ajax_json_data(){ var databox = document.getElementById("databox"); var field1 = document.getElementById("field1").value; var results = document.getElementById("results"); var x = new XMLHttpRequest(); x.open( "POST",

Error: JSON.parse: unexpected non-whitespace character after JSON data

阅读更多关于 Error: JSON.parse: unexpected non-whitespace character after JSON data

ANTLR best practice for finding and catching parse errors

阅读更多关于 ANTLR best practice for finding and catching parse errors

问题 This question concerns how to get error messages out of an ANTLR4 parser in C# in Visual Studio. I feed the ANTLR parser a known bad input string, but I am not seeing any errors or parse exceptions thrown during the (bad) parse. Thus, my exception handler does not get a chance to create and store any error messages during the parse. I am working with an ANTLR4 grammar that I know to be correct because I can see correct parse operation outputs in graphical form with an ANTLR extension to

Finding First and Follow in Top Down Parsing

阅读更多关于 Finding First and Follow in Top Down Parsing

问题 I have a set of grammar and I need to find the First and Follow from it. So far I've managed to make the First, but now I'm confused as to how to make the Follow. The set of grammar that I tried to solve: E -> -E | (E) | VT T -> -E | ε V -> id L L -> (E) | ε The First that I've come up with. If something's wrong, please inform me: First (E) = -, (, id First (T) = -, ε First (V) = id First (L) = (, ε Here's the Follow that I've managed to gather up so far: Follow (E) = $, ) Follow (T) = $, )

Wrong accented characters using Beautiful Soup in Python on a local HTML file

阅读更多关于 Wrong accented characters using Beautiful Soup in Python on a local HTML file

问题 I'm quite familiar with Beautiful Soup in Python, I have always used to scrape live site. Now I'm scraping a local HTML file (link, in case you want to test the code), the only problem is that accented characters are not represented in the correct way (this never happened to me when scraping live sites). This is a simplified version of the code import requests, urllib.request, time, unicodedata, csv from bs4 import BeautifulSoup soup = BeautifulSoup(open('AH.html'), "html.parser") tables =