Health Care Service Corporation (HCSC) is a not-for-profit corporation health insurance company in the United States. The current scope of Data Accelerator project is data ingestion and data processing. In this project we are ingesting data from different data sources to the data lake by applying the required business transformation rules and later analyzing the data for faulty records.
This project was developed using Agile development methodology having Sprint of 3 weeks.
Role and Responsibilities:
As part of each Sprint, we were allocated respective target tables to which data should be loaded.
Fixed length/Delimited flat files were loaded into the Source tables.
According to the Mapping document we design the join criteria for the source tables.
One or more source tables to be joined, applying different business rules and loading to Temp table.
Writing Scala code to perform transformations joins on data frames and removing duplicates before loading to the target table.
Developing code involved writing shell script, HQL’s, Spark-Scala code.
Zena is used for job scheduling.