EDGAR SEC 10-K Individual Sections Parser

点点圈 提交于 2021-02-11 15:54:01

问题


Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings?

I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into individual sections is looking to be quite challenging - not impossible though.

Before reinventing the wheel, I thought, I ask the community first if they know of any existing solutions for this. I've found https://jodie.ai/hi/ which has the 10-K statements divided into sections but only dating back to 2009.

Thanks for the help!


回答1:


I just commented above about a related question I have, in which the related BigQuery dataset may be the answer to your question. I haven't managed to make it work myself however for extracting individual filing sections.

The next option I found, which isn't an API and thus doesn't stay current but does go back to 1993, is the repository at https://sraf.nd.edu/data/. I can't tell yet if the sections are broken out exactly as you're looking for but a substantial amount of pre-cleaning has been done, making it either an easier starting point for you and/or a useful check against your own parsing code. The resources site there includes links to earlier papers analyzing the same and useful things like dictionaries and related word lists, and the code page includes their own python cleaning work, which appears to have been quite comprehensive.

Still not the full, clean API I think you and I are both looking for, but the best I've found.



来源:https://stackoverflow.com/questions/61772155/edgar-sec-10-k-individual-sections-parser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!