There are 3-4 sites I want to extract info dynamically (every 1 week refresh of data). It should get this small subset of info from these sites and when the user does a search along a few dimensions the extracted info dynamically (ajax - like kayak[dot]com) comes in along those search criteria. (similar to what vertical search engines do like simply hired). The script should be flexible to incorporate RSS feed data along with the extracted data.
crawl - & collect the event info summary details -
it should handle retrieveing info on multiple page results
Also how will you do this - using curl? We also want to retrieve updated info only 1 per week (assuming the new content of those sites is that frquency)
save the data to a mysql & update the database with new content only
The crawler must not overload the selected sites.
We just want it to be easy to add additional urls ourseleves by modifying these tags.
Also to be clear it should be dynamic searching and loading each time from tese three sites as info is accessed- not a one-time fetchand saving of data.
(same as [url removed, login to view] displays and crawls different travel sites for info after each search - although not relatime - we just need data updated every week).
pls. place the bid amt