Hi,
I'd like to scrape the course contents from 2 university websites.
The website URLs are RESTful and the sites are static (no JS, interaction etc). A master page contains links to all the course content that needs to be scraped, and all course content are presented in the same format.
This should be a very easy task for someone with web scraping experience.
I have attached a walkthrough that outlines what items on the website should be mapped to what fields in the database/csv file. Please look through the walkthrough before responding.
The code should ideally be done in Python or Ruby. Java and other open source languages are ok.
The code should follow good software engineering principles and design patterns and should be easily extendable to other sites with similar formats (I will be reviewing the code after and developing it further).
Thanks,
Henry