Facial recognition project:
We are seeking technology that automatically recognises particular faces from images.
We will provide as input pictures of particular people (eg Bill Clinton), and a list of URLs where pictures of him may exist. Your job is to
1. download these URLs (there will be 1-30 of these per person)
2. extract the image URLs from the html
3. download the images
4. For each image, determine whether it's the same person as the provided images.
5. Output the image files combined with the URLs they came from.
This must be automated as you will be need to follow the above process for many thousands of people.
This is a data-oriented task. No user interface is necessary. In production, it will run as a web service, either using SQL queries to our database or XML, whichever you are most comfortable with.
We have developed a simple algorithm that simply looks for images within certain width and height ranges and then looks for strings matching the person's name in the filename and alt tag. This only shows incorrect (ie. a different person or not a person at all) photos about 10% of the time but misses a large proportion of available photos. You may use this algo as a base with weights added by image recognition if you wish.
You will be expected to catch at least 90% of valid images and show incorrect images only 5% of the time.
Linux is the prefered platform, but we will consider Windows bids.