Make sure to include the following when submitting your proposal:
• Solution Approach
• Use case or example works similar to this project
Client will require the worker to have insurance, sign standard documents (e.g. NDA, IP Assignment) and be background screened (and possibly drug tested based on project subject matter).
Metadata informs business needs and should be accurate and complete. Additionally, there is a desire to understand how that data is produced and ultimately consumed by the business
The metadata governance team has created consistent and scalable processes regarding metadata in the company but only works for SQL databases. The team has designed and architected a way to leverage Python to extend this to all of the company's metadata. Additionally, they have identified how data flows through the business from it's creation through to its consumption.
Develop the code to implement the extension solution for auditing metadata not found in SQL databases and create a web portal to display both that information as well as the way data flows through the business, including any lost connections on the consumption side.
Enable the team to run a scalable and efficient process to satisfy data governance needs on all metadata within the company and to share that with users through the web portal
Work Product, Python Build – Python Developer
Meta-data is currently being tested for quality and completeness using Informatica 10 across 6,000 tables with 3 maps and a different workflow for each system. The limitation is that this data governance process is only compatible with SQL DB. They have partially designed and architected a process to use Python to extend this beyond and create a more user-friendly and automated process to make this truly dynamic.
The need is to execute a standalone project using Python/PySpark to build to a relatively well documented and architected requirement to implement this utility. Sample data will be given to make it as turn-key as possible to deliver code that meets the requirements. The expectation is that the worker would be available for periodic pull ups with the team throughout the project at some defined interval before delivering the final code at the completion of the project.
Skills & Experience
1. Required Skills
• Python, Hadoop, PySpark
2. Preferred Skills
• Excellent communication skills
3. Skill Level
4. Other Skills
5. Experience Required
• Data Governance