I would like a script written in PHP for a linux server.
I will call it a site replicator for easy understanding.
I would like it to do the following:
- Be given a starting URL to start.
- Crawl every possible link it finds within this page and start downloading everything that it finds, and save what it finds,keeping the directory structure the same, on my server.
-I want to be able to specify the ammount of layers I would like it to crawl. (How deep the scan should go)
-I want to be able to exclude certain file types if neccesary
-I want it to get the html pages AND THE IMAGES FOUND ON THESE PAGES
- I want it to be able to follow dynamic URLS as well. If the pages are made from a CGI or php script then I want them to be able to be downloaded as well.
- bonus would be to have an option to stay only within this root URL, or allow it to go outside the given URL.
I want it to start on the starting URL and download everything it finds, and save it to my server.
Pretty simple right?