The whole requirement to build a job search engine e.g. [url removed, login to view]
Possibly having capability to grab jobs from any type of sites.
Points to consider:
Suggest between real time crawl, or say delay of up to 24h whats feasible.
Writing screen scrapping rules for each web site/ group ..or suggest.
Sites change and xpath's become invalid. Some kind of admin notification system might be in order if you need to be informed that certain hosts suddenly have stared to return no information.
YOu would certainly write config rules for all of the sites.
smart ranking algorithm, user can browse through all off them (in combination with facets)
Most importantly (MUST requirement).
Scrape taleo based sites...e.g.
URL: [url removed, login to view]
Apachec Nutch/ Solr/ SElenium on ubuntu, currently have two nodes
Any other software recommend:
Ready for a skype interview.
Apache Nutch, Solr, Scrapy, Selenium