EDGAR SEC 10-K Individual Sections Parser

问题

Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings?

I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into individual sections is looking to be quite challenging - not impossible though.

Before reinventing the wheel, I thought, I ask the community first if they know of any existing solutions for this. I've found https://jodie.ai/hi/ which has the 10-K statements divided into sections but only dating back to 2009.

Thanks for the help!

回答1:

I just commented above about a related question I have, in which the related BigQuery dataset may be the answer to your question. I haven't managed to make it work myself however for extracting individual filing sections.

The next option I found, which isn't an API and thus doesn't stay current but does go back to 1993, is the repository at https://sraf.nd.edu/data/. I can't tell yet if the sections are broken out exactly as you're looking for but a substantial amount of pre-cleaning has been done, making it either an easier starting point for you and/or a useful check against your own parsing code. The resources site there includes links to earlier papers analyzing the same and useful things like dictionaries and related word lists, and the code page includes their own python cleaning work, which appears to have been quite comprehensive.

Still not the full, clean API I think you and I are both looking for, but the best I've found.

来源：https://stackoverflow.com/questions/61772155/edgar-sec-10-k-individual-sections-parser

标签

python

api

parsing

web-scraping