A Data Science Professional and PySpark Developer with 3+ years of experience helping enterprises and clients around the world to solve big data problems with the right data platform strategy, data analytics, machine learning, artificial intelligence, and cloud-native PaaS (Platform as a Service).
My domain expertise lies in Insurance, Manufacturing, E-Commerce & Healthcare domains. I can guarantee the accuracy and quality of my work so the employer will be 100% satisfied.
My services include:
• Data Engineering, Big Data Pipelining, Data Lake, ETL Orchestration
• End-to-end data migration from on-premises to cloud infrastructure
• ETL Jobs in Spark Using Data Frames in PySpark
• Data cleaning and transformation with Spark and Python (NumPy, Pandas)
• Optimization and improvement on existing PySpark code
• Modification to existing ETL Job
• Cloud service: AWS Services, Data bricks and much more on demand.
My expertise on the following Tools/Technologies:
• Apache Spark (PySpark)
• Python3, NumPy, Pandas, unit test script, Parsing and mapping of XML, COBOL
• AWS Cloud Services: Glue, Lambda, Step Function, EC2, ECS, S3, DynamoDB, IAM, Kafka, Firehose, SNS, SES, SQS, Cloud Formation, KMS, CloudWatch etc.)
• CI / CD Pipelines
• SQL / MySQL
• Git, GitHub, Bitbucket
• IBM DB2
• MS Excel, MS Office
• PyCharm, VS Code, Databricks, Jupyter Notebook
• OS: Linux (Ubuntu, CentOS), Windows