I want to create a job that scrapes four (4) websites and stores the files into an S3 using AWS lambda. I will give you the S3 bucket but you will need to run the lambda to start and stop daily. In total the S3 bucket will land close to 15,000 files per day, with a column width that can range from 10-6000 columns. I will also need files to be named to account for timestamps, as this will be used for an ETL job downstream. This data will need to be outputted as a CSV and JSON and will need functionality to export to run a manual analysis.
The Scrapper will need to run multiple steps to generate the full raw data set required. This will not be a simple and direct page to db scrape.