I have an existing scrapper running with a back end management system in PHP. Both scraper and back end are coded in PHP with MySQL dbase.
The scrapper is for a real estate website that collects sales data, property address and details, and underlying photos. The original scraper is designed to track the same property and log when the property changes price, sold or rented over time. So the same property scan be scraped many times over a course of a few weeks and will only log the changes in the property price etc (see Summary photo for an example). This scrapper already EXISTS and I would like to fix bugs to make it operate better.
So please quote on the following, I would like to amend the scrapper to fix a few bugs and adjustment to the backend system including:
1. Everytime the scraper runs, it collects the photos of the same property again even though it shouldn't. See attached photo of summary page that has duplicate photos.
2. Algorithm for deleting duplicate photos in the database and photo files on the server for a given property. Duplicate files have been collected due to an error in the scrapper as above - to the extent possible, i would like to reverse this and just keep one set of photos - i don't want to do this manually!
3. Adding a delete link for each photo on the summary page so I can see which photo I am deleting
4. Error checking and review of the scrapper for improvement and efficiency (which will lead into flow on work if I am happy) as each scrapper run time is currently tracked and monitored
It is critical for this project that the database (and photo files) remain intact and not compromised (ex. duplicate photos).