I am an IT professional with 3.7 years of IT experience in Big Data and Hadoop Ecosystems.
Experienced in providing Big Data solutions using Hadoop and its components like HDFS, Sqoop, Flume, Hive, SparkSQL, Spark Streaming, Kafka, Cassandra, Scala, AWS, MapReduce, Yarn, Apache Drill, Hbase, Oozie
Experienced in writing spark scala code to load various file formats like csv, json, parquet etc.
I have worked on RDD, SparkSQL, Dataframes for transforming and analysing data.
Also I have experience in Spark streaming with kafka, kinesis and loading data into Cassandra/ DynamoDB.
Key Abilities:
• Experience in analyzing data using HiveQL, PIG Latin to meet the business requirements.
• Written Hive queries for creating Managed/external tables, Data Preprocessing for right shifts in data,
Hive SerDe to load data with multiple delimiters, Regular expressions.
• Implemented partitioning, bucketing, Map side join in Hive to optimize performance.
• Importing and exporting data into HDFS from database and vice versa using Sqoop.
• Good knowledge in collecting and storing stream data in HDFS using Apache Flume.
• Involved in performing data transformations using PIG operators.
• Good understanding of Hadoop architecture, Map-Reduce programming model, Hadoop Distributed File Systems (HDFS), various File Formats.