Resume Parsing, known as CV Parsing, Resume Extraction or CV Extraction, is the conversion of a free-form CV/resume document into structured information in JSON or XML format — suitable for storage, reporting and manipulation.
Input:This parser will be used to parse thousands of UNSTRUCTURED resumes in html, word (doc, docx), rtf, text, rtf and pdf formats. Resumes will be in English language to start with but the parser should be capable of parsing resumes in other languages.
Output: JSON or XML format files of the resume when all the words from resume are parsed and structured correctly.
Parser should also output the report and metric on what worked and what did not. This report should help in further tuning the parser.
Accuracy: We areexpecting a high degree of accuracy. Ideally the parser should be able to reach maintain 95% accuracy. We are expecting minimum of 90% accuracy for each resume.
Method: We are not looking at keyword based parser. We are looking for a parser that can be trained and improve its accuracy over a period. Also, the fields parsed should be configurable and we should be able to extend the parser in a fashion that we can add more fields to be parsed with little effort. Parser should be able to do Fuzzy look up
Resume Parser should be written such that in can eventually be invoked using a web based application.
In order to identify information from CVs and profiles, the extraction engine should learn to create “rules” within its machine-learning algorithms by analysing the data.
Fields that need to be parsed – Not all the resumes will have all the fields – Also the parsing should not limit to just these fields.
1. First Name
2. Middle Name
3. Last Name
4. Email Address
5. Mobile Number
6. Date of Birth
8. Social media profile links
15. Zip Code
18. Current Location
19. Skills(top to bottom)
20. Current Company Name
21. Current Company Designation
22. Current Company Start Date
23. Current Company End Date
24. Current Company Roles & Responsibilities
25. Previous Company Name
26. Previous Company Designation
27. Previous Company Start Date
28. Previous Company End Date
29. Previous Company Roles & Responsibilities
31. Year of passing
34. Marital Status
35. Language Known
38. Soft Skills
39. Training courses
Output of the parser should be an xml or JSON tagged file, one xml file for each parsed resume, output file name to be the same as the input file name with extension.
All the parsed fields will be used to upload into a MySQL database. Parser is also required to do the database insertion as part of the parsing process.
We will supply a sample set of resumes, as many as you need to be successful.