I need to generate test data using spark code in HDFS path. If storing in AWS will be also more useful
Requirments :
We need to give the column names that needs to be created
Number of rows to be generated
Output format can be csv,parquet,txt,json
For the columns created we need to provide the data from another file
Hai,
I am Bigdata engineer and I am having rich experience in data
pipelines and data processing on Hadoop,Azure and AWS
using pyspark and java
I can build a simple script for your requirements and we can
make a great pipeline on that.
I am also a certified azure data engineer.
Hi,
I am a certified Azure Solution Architect and Data Engineer with Vast experience on on-prem spark and Databricks on Azure. I have 10+ years of experinece working in Data and Analytics using ETL, SQL and Spark.