parsing

Writing an HTML Parser

て烟熏妆下的殇ゞ 提交于 2021-02-15 10:16:43
问题 I am currently attempting (or planning to attempt) to write a simple (as possible) program to parse an html document into a tree. After googling I have found many answers saying "don't do it it's been done" (or words to that effect); and references to examples of HTML parsers; and also a rather emphatic article on why one shouldn't use Regular expresions. However I haven't found any guides on the "right" way to write a parser. (This, by the way, is something I'm attempting more as a learning

json file is missing/ struct is wrong

穿精又带淫゛_ 提交于 2021-02-15 07:48:00
问题 I have been trying to get this code to work for like 6 hours. I get the error: "failed to convert The data couldn’t be read because it is missing." I don't know while the File is missing is there something wrong in my models(structs). Do I need to write a struct for very json dictionary? Currently I have only made those JSON dictionaries to a struct, which I actually need. The full JSON file can be found at https://api.met.no/weatherapi/sunrise/2.0/.json?lat=40.7127&lon=-74.0059&date=2020-12

json file is missing/ struct is wrong

大兔子大兔子 提交于 2021-02-15 07:47:08
问题 I have been trying to get this code to work for like 6 hours. I get the error: "failed to convert The data couldn’t be read because it is missing." I don't know while the File is missing is there something wrong in my models(structs). Do I need to write a struct for very json dictionary? Currently I have only made those JSON dictionaries to a struct, which I actually need. The full JSON file can be found at https://api.met.no/weatherapi/sunrise/2.0/.json?lat=40.7127&lon=-74.0059&date=2020-12

Replacing a custom “HTML” tag in a Python string

社会主义新天地 提交于 2021-02-11 16:49:20
问题 I want to be able to include a custom "HTML" tag in a string, such as: "This is a <photo id="4" /> string" . In this case the custom tag is <photo id="4" /> . I would also be fine changing this custom tag to be written differently if it makes it easier, ie [photo id:4] or something. I want to be able to pass this string to a function that will extract the tag <photo id="4" /> , and allow me to transform this to some more complicated template like <div class="photo"><img src="...." alt="..."><

EDGAR SEC 10-K Individual Sections Parser

点点圈 提交于 2021-02-11 15:54:01
问题 Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings? I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into

Error: JSON.parse: unexpected non-whitespace character after JSON data

五迷三道 提交于 2021-02-11 15:52:08
问题 I have a problem with Json pars, I have seen tons of users had this problem I saw them all but I couldn't understand where my error is in my code! Sorry if this is duplicate! First file index.html: This is in the head section of the file: <script type="text/javascript"> function ajax_json_data(){ var databox = document.getElementById("databox"); var field1 = document.getElementById("field1").value; var results = document.getElementById("results"); var x = new XMLHttpRequest(); x.open( "POST",

Error: JSON.parse: unexpected non-whitespace character after JSON data

一世执手 提交于 2021-02-11 15:50:27
问题 I have a problem with Json pars, I have seen tons of users had this problem I saw them all but I couldn't understand where my error is in my code! Sorry if this is duplicate! First file index.html: This is in the head section of the file: <script type="text/javascript"> function ajax_json_data(){ var databox = document.getElementById("databox"); var field1 = document.getElementById("field1").value; var results = document.getElementById("results"); var x = new XMLHttpRequest(); x.open( "POST",

ANTLR best practice for finding and catching parse errors

若如初见. 提交于 2021-02-11 15:05:38
问题 This question concerns how to get error messages out of an ANTLR4 parser in C# in Visual Studio. I feed the ANTLR parser a known bad input string, but I am not seeing any errors or parse exceptions thrown during the (bad) parse. Thus, my exception handler does not get a chance to create and store any error messages during the parse. I am working with an ANTLR4 grammar that I know to be correct because I can see correct parse operation outputs in graphical form with an ANTLR extension to

Finding First and Follow in Top Down Parsing

冷暖自知 提交于 2021-02-11 14:51:12
问题 I have a set of grammar and I need to find the First and Follow from it. So far I've managed to make the First, but now I'm confused as to how to make the Follow. The set of grammar that I tried to solve: E -> -E | (E) | VT T -> -E | ε V -> id L L -> (E) | ε The First that I've come up with. If something's wrong, please inform me: First (E) = -, (, id First (T) = -, ε First (V) = id First (L) = (, ε Here's the Follow that I've managed to gather up so far: Follow (E) = $, ) Follow (T) = $, )

Wrong accented characters using Beautiful Soup in Python on a local HTML file

瘦欲@ 提交于 2021-02-11 14:39:37
问题 I'm quite familiar with Beautiful Soup in Python, I have always used to scrape live site. Now I'm scraping a local HTML file (link, in case you want to test the code), the only problem is that accented characters are not represented in the correct way (this never happened to me when scraping live sites). This is a simplified version of the code import requests, urllib.request, time, unicodedata, csv from bs4 import BeautifulSoup soup = BeautifulSoup(open('AH.html'), "html.parser") tables =