I'm looking for an web application / script that will scrape a website's cache through google. It must be able to scape and edit unlimited pages.
I'd like to input a domain and have the script go to google and scrape all of the html from google's cached links and save it in a .txt or html format, plus add the content to separate folders just like the website.
For example: If the site had content in a folder named /blog ([url removed, login to view]), the software will save the pages in a html or text format in the respective folder. The URLs should be saved exactly the same as seen in the cache.
All of this data should be also ideally be saved into a database as well.
It also needs to be able to extract any html or code from each page and save it as the original file name and in it's respective folder.
This is because some websites have images, java script, and other code that will not work after the hosting is transferred. For example, google adds their own html at the top of these cached pages and most will have their own contact forms, etc, that must be removed.
There also needs to be proxy support so scraper doesn't get banned.
I would want the application / script to be written in php / mysql and related languages as needed.
Upon completion, I would like you provide a screen cast demonstration video of the software in action and provide full rights to the software.
If changes and updates are needed to be made in the future I want to be assured that this person or team is willing and interested in working at a reasonable and fair rate.
I want to be able to create a long term relationship as I have many other projects for the right team or person.
Please provide any examples of similar projects.
To be CONSIDERED. Please reply with the words "I can help you scrape" in your reply. Those who do not will not be considered.
If you are this person or team please apply.
Please ask any questions you may have.