SEC 10-Q and 10-K filings are filed as text SGML HTML or XBRL documents, with variations in the layout of the tables containing financial data (with the names of tables, order of columns, and the labeling of rows being different filing to filing). The goal of this project is to write a program to extract the basic financial information ("fundamentals") from 10-Q and 10-K filings and import said information into a MySQL database. The SEC offers free access to its ftp server where all company filings are stored ([url removed, login to view]). The proposed software will automate the process of monitoring the SEC's database for new company 10-K and 10-Q filings, downloading the filings when they occur, and parsing and importing the financial statement data into a MySQL database. The monitoring should be quite simple, as the SEC provides a free RSS that is updated every hour to show the latest filings. Please see ftp://[url removed, login to view] for related information.
A few items that greatly simplify the achievement of the preceding goal:
1. the SEC ftp server has a very simple and intuitive system for determining the path to find company specific documents
2. the SEC provides an updated list of all companies and their relevant identifiers for determining data location on the ftp server
1) This will only need to occur once: download all 10-K filings for all companies filing with the SEC, parse financial statements, and import into MySQL database.
2) Monitor SEC database for new filings, download new filings, parse the financial statement data, and import into MySQL database.
Often there are all kinds of supporting tables in a 10-Q and 10-K, however you are only interested in the following three tables: Balance Sheet, Statement of Income/Operations, and Statement of Cash Flows. Within each table at the start of each row is an accounting term that is the label for that row. You will need to have enough basic accounting knowledge to be able to guess which XBRL tags most closely match the row labels used in the filings you are converting.
Further, for each quarter's or year's filing, you are only interested in that quarter's or year's numbers (normally other quarters are included for comparison), which means you are only interested in a single column of numbers in each table. This means you are really only interested in parsing out about 70 numbers at most from the whole document (in addition to downloading the line item names, such as "Net Revenue") as well.
This realization (that you are only trying to output list of about 70 line item names and corresponding numbers in XML format) should help you get a much better grip on the project as XBRL and 10-Qs initially seem daunting due to the flexibility of XBRL and all the extra text and tables (that you don't care about) in 10-Qs.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition: i.e. A software installation package that will install the software in ready-to-run condition on the Windows Server 2008 r2 platform.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).