ETL Pipeline Overview: Importing data from RDBMS(MySQL) and local file system by using Sqoop and Linux respectively to HDFS and also direct import to hive using Sqoop. Data cleansing using PySpark and then Do data manipulation according to business need using Spark SQL. By using Hive SQL also do Data manipulation . Exporting data from HDFS to RDBMS and local file syatem using Sqoop and Linux commands. Analyzing the data using Power Bi and Tableau, preparing dashboards and providing it to the client. Additional Responsibility: Worked on severity-2 management incidents and Change requests. SQL Server, T-SQL Experience, Joins, Data Warehousing,Data Modeling, OLTP, OLAP. Handling three Databases with the help of MSSQL managementstudio. Excel – Hlookup, Vlookup, pivots and other advanced functions. Worked on Production and Non-Production Deployments. Feb 2021 to Present: Azure Data Factory(ADF) pipeline: Uploading the raw data/ file to Azure data lake storage(Storage system). Creating Notebook using Azure data bricksfrom where we run python scripts(Pyspark scripts) . Ingesting different types of data (CSV, JSON,Parquet,XML) by spark dataframe reader API Doing data cleansing and manipulation using Pyspark with the help of Spark APIs. By using Spark dataframe writer API export the data to Data sink(MySQL) etc.. Analyzing data using Pandas and Numpy and make reports by using MatportLib , Seaborn and Power BI tool.