Develop parsing software that would handle a pre-defined number of websites.
one of key website is [url removed, login to view]
- ability to proceed to the next page when parsing lists of products (typical list of ads would be about 30-50 pages)
- ability to go to description page of each product and parse all information from that page
- product page includes a hidden phone number. it is accessible by clicking a button which results in a pop-up screen where the phone number is displayed as an image , not text. it is easily readable however (not captcha type of picture, you can't even tell its a picture until you try to copy). also, when you browse the website from a phone, it lets you make a call by clicking that number, so obviously the text is hidden somewhere inside the code, so that would great parse instead of doing OCR on the picture
- other fields on the product page are pretty much straighforward, so phone number is the only challenge here
- data from some fields needs to be processed in specific ways, but not complicated (for example some text data comes crammed up in one field and needs to be separated using simple rules). only text data needs to be stored for all fields, all images need to be saved only in the form of urls.
- software needs to be able to parse millions of ads in reasonable timeframes. also, the question is whether all parsed data can be exported to excel and csv in some chunks not to overload memory, or whether a lot of space will be required to store that data.
I need a balance of cost and functionality. it does not have to be beautiful, but it has to work well and be possible to upgrade for new websites.
48 freelancer đang chào giá trung bình $2223 cho công việc này
Hi, I have developed similar scraper/crawler and data/web automation projects. Please let me know if you are interested and I am available to start right away.