Programmer for Tree Parsing/Text Mining
Job Summary
Seeking an experienced programmer for engagement in long-term freelance work. Strong tree parsing skills are essential. A background in NLP and experience with NLTK is preferred but not required. Pay is commensurate with experience and is hourly-based. As part of our hiring process, we ask that interested candidates successfully complete the tasks below to demonstrate basic competency.
Project Background
The SEC stores various text files they receive from companies on their Edgar website. The files typically contain detailed discussions of companies’ performance as well as financial data summarizing their performance. Attached is a random sample of 15 full .txt files from 5 different years with a file type of “10-K” from Edgar. You will find files which embed HTML, SGML, or XBRL code, in addition to tables, special characters, images, and other embedded files, such as PDF, etc.
Tasks
Extract the following sections from the 10-K using a tree parser: Management Discussion and Analysis (MD&A), Risk Factors, and Notes to the Financial Statements.
Flatten each section extracted to raw text. That is, remove all code, tables, images, or embedded files. Write the raw text of each section to a separate .txt file. The filename for the raw text file should be that of its parent with a suffix for each section appended (e.g., “*[login to view URL]”, “*[login to view URL]”, and “*[login to view URL]”).
Discuss any outstanding issues, questions, or concerns regarding the steps above. For example, discuss weaknesses in your approach to identifying section and sentence boundaries.
Apply
For full consideration, please upload your resume, output, and responses by April 15, 2015. We are an equal opportunity employer. Work permits or visas are not required.
hi
i can parse text from many type of files including .txt, csv, pdf, doc, docx, png, jpeg, psd, rst etc.
i am ready to do the task .
i could not see that link of text files ?
could you give me the text file ?(link)
Hi,
I am a graduate research student doing research on network programming languages. My work on NPLs involves representing network topologies in graphs like tree data structures and running different algorithms on those data structures. I also have deep understanding of NLP as I have worked on lexicons, parsers and regular grammars. Besides, I have experience of 4 years in software development. I can deliver you the result with the quality you expect.
I haven't found any attachment. Please provide the files. I shall upload the resume, output and responses soon after having the files.
Thanks,
Shahbaz
Hello,
I'm a freelance Python developer and I and very interested in being your developer for the job '"Document parsing and text mining in Python"
I have worked on projects that required parsing files and I worked with pdf, doc, csv, docx and odf formats.I have also worked on two projects that involved data mining, getting to use libraries such as Numpy, Scipy, NLTK, Scrapy, Gensim, Requests and Matplotlib. Worth mentioning is that I performed some Natural Language Processing on the data and also semantic matching.
Please refer to my portfolio for previous projects I have handled.
I'm looking forward to hearing from you,
Regards,
Aurlus I. Wedava
I can set this up in Python.
- networkx for graph object to trace extractions.
- pdf to text no problem.
Based in Toronto. Though, I'm afraid I can't commit to a skills demo without a milestone or compensation.
Great experience in NLP, text mining, contextual extraction, sentiment analysis. Using combination of advanced tools which are written by me and commercial software. Also could twice increase amount of work hr/week if needed.
Also got several ready-to-work classification taxonomies in different subject domains from past projects. All my code is working now like client-server python software, sending text to server and receiving clean version, facts, categories. Also could do data mining on text/statistical/social graph information.
P.S. If needed, could enable to work small team (they would be lucky to take a part in interesting project) to solve advanced statistical tasks using textual/numeric information, for example, parsing pharma data and searching if symptoms of seasonal illness correlate with prices or smith else, like retail customer segmentation, telecom/banking messages analysis, credit scoring models...
P.P.S. By the way, file attached with data is not available now for testing... m.b. deleted by system... Could you please attach that file to test my skills?