I need a Mp3 Crawler/Spider to be scripted that would crawl websites recursively and add all audio files it finds to a database. The audio links must be added in the database along with related keywords(calculated by looking at the keyword density on the page where it was found, ID3 tags, Title of page etc..basically a fine tuned algorithm to return best results!).
The database must also contain the id3 information for the file, must work efficiently and fast as it grows..so I am not sure which database would be best for such a thing. So basically, I need a script like [url removed, login to view], [url removed, login to view], [url removed, login to view] emp3world. etc.
The script must also have the ability to check the links in the database and ensure the files still exist in a cron during off-peak traffic hours so it wont choke the server.
The script would be hosted on a high-end dedicated server, so server resources shouldn't be a concern, but I still expect the script to be perfectly optimized, commented, indented and easy to extend(multiple databases, more features etc).
People having previous experience will be preferred, will also require a portfolio or examples of previous jobs.
This will be integrated in a high-traffic music site so it should be able to handle huge amounts of traffic.
Thanks so much!