We are looking for a programmer that can write/configure a webcrawler to crawl a website and retrieve the records list.
We are thinking to use Apache Nutch (with selenium) to do the crawling (other possible).
These records need to be parsed, so the information (id, title, introtext, date,...) can be stored in a database.
If this job is done with success, the crawler can be extended to crawl more websites (100+) in a next project.
We are looking for a long term partnership with good software engineers. Our company consists of 20+ software engineers and is looking to expand the team via outsourcing.
- Max 100 euro
- Preferably developers from East Europe
- Preferably eastern european developers
- Usage of Apache Nutch or similar webcrawler is necessary (no custom crawl script) as we want to be able to find documentation on the internet for other programmers and make full advantage of plugins.
- Ideal project for computer science engineering students looking for a job on the side. We are looking for data management / analytics people. If this job is done ok, we can work together for further analysis projects.
14 freelancer đang chào giá trung bình €146 cho công việc này
Hello Sir, I have good experience in scrapping data from specified website. I can also give demo of the same. Please send me message for further information.
Hi I'm a java developer and I'm interested in this project. I have some experience in web scrapping/automation, including (selenium). Contact me to discuss more details.