We need a way to scrape data from personal wedding websites listed on theknot.com. All of the websites are in a directory [login to view URL] ([login to view URL]) The information we'd like is:
1. The URL of the wedding page
2. Location of wedding events
3. Date of wedding event
3. Full names of bride and groom
4. Contact information: e-mail & phone numbers
My primary data points are location of the event, date of the event, and name of the bride and groom - the others are secondary. Our goal is to be able to sort based on date range and locations.
We need a way to do this periodically so that we can scrape new wedding sites as they are added and update existing sites that we have data for. We are open to hiring someone to do this manually or having someone write a script that will crawl on a weekly or monthly basis.
If successful, we would like to craw several other wedding host sites.