We have a list of societies with their physical adress (1m. records) We would like a system to extract the company web page.
We have made up a system but it is slow and complicated to parametrize. The idea is to have one that is automatical and does not need to be manually revised.
EG: Name of company, adress.
[url removed, login to view]+ikea+badalona&fr=ush_on_omg
We would prefer to do it via google but it blocks out the ip very quickly.
Our system is to get a list in yahoo and plugg in their language (eg: [url removed, login to view] etc.) and process it. We get the 10 first results and save it in an excel. Company X, has 10 results.
The system has to filter out directories that repeat (in different searches), and try to get the unique dominion that has the company web page.
From excel plug in the name and adrees
translate it to yahoo searcher.
copy paste to program
visually check that it is not blocked
when blocked restart modem
wait 2 min and so on.
Need of a better system.
Possible improvements: one archive (if bigger of 1m and just forget the system the better) all results. Archive result name of company, adress, and diffrent columns and the most likely web page.
Đã trao cho:
17 freelancer đang chào giá trung bình €374 cho công việc này
Hello Sir, We are expert in Java based scraper , Have successfully completed more than 60+ bots for different websites. Please check your PMB for more details.