1. Each website would be having < 10000 Products (Sometimes very less)
2. Fields to extract and links to extract would be described in the later pages
3. Robust xpath and css links in [login to view URL] & [login to view URL] and or .html or .json path whatsoever relevant used in the project
4. Code should be self-explanatory with relevant comments and explanations in the code delivery
5. Initial step would be validation of crawled data with that of available on the websites
6. If found matching and found it has all the products that are there on the given websites, then the delivery would be code along with the data (for couple of days)
7. A Demo/ Document explaining the code execution is needed
8. Support for 3 days in executing the code to fetching the result would be appreciated
9. For most of the above websites, the first step would be selection of location (Eg. Bangalore/ Bengaluru etc.) Based on which the availability products and corresponding prices would vary. Code has to have a provision for the same and it can be given as an input in the python code. For this project, the input can be assigned to Bangalore or Bengaluru. There has to be a provision to provide more than one location and the code runs in loop to execute for multiple locations (Very Important)
10. Download delay or time delay for each request can be given as an input and there has to be a provision for the same in the code (Not to overload the websites)
11. Provision to incorporate TOR (TOR & Privoxy) & proxy IPS & middleware etc. as per your knowledge to allow for scrapping without getting blocked is needed and documentation for the same needs to be provided which can be replicated here
12. For Torifying / or hiding IP or rotating IPs, usage of open source is sought rather using proxy providers to obtain proxies to rotate the IPs. Advice is sought in the form of delivery document to scrap without getting blocked.
13. Crawl spider or Gen spiders can be used with link extractors or followers to extract all data from all the categories
14. Data output is needed to be in .csv & .json format
15. Code would be having the city name as input (Eg. Bangalore) and the code would run and write out 11 output files, 1 for each of the 11 websites. Fields would be described in the subsequent pages. (Single code for all the websites or one for each, anything is fine)
16. Scrapy should automatically follow all categories one by one as will be described in the later pages. If there is addition or deletion or renaming of new categories, scrapy should still be able to crawl all categories and publish relevant data.
P.S. Other Details would be shared once we start collaborating. Looking for cost effective collaboration. Thanks.
7 freelancer đang chào giá trung bình ₹11452 cho công việc này
Hi sir I am a Python developer. I have scraped dozen of sites. i am new on freelancer and fully passionate to do any Python [login to view URL] goal is to deliver my best and give an optimal results to our clients.