I need a crawler that will crawl a list of domains that I will load in from a CSV file. The crawler needs to crawl ONLY THE LANDING PAGE - not the entire site - and capture the following and output a CSV file and stored to Dropbox:
1) Does URL have Google Analytics code - yes or no.
Use a search for "Google Analytics" in the source of the page.
Use a search for the word "Privacy" in the link text
3) How many unique internal URL links are present on the page. Return link count.
4) Is the URL secure (SSL) - yes or no.
5) Is the URL mobile-friendly - yes or no.
Use a search for "meta name="viewport"" in the source of the page.
6) Is the domain parked - yes or no.
Look for keywords or phrases in the source code.
7) Is a phone number present on the page? yes or no. Capture the phone number.
8) URL being crawled.
The crawler must be capable of crawling 70,000 URLs per hour.
To be successful, the script will be tested using 70,000 URLs in one hour.
18 freelancer đang chào giá trung bình $221 cho công việc này
I will develop a spider for you using Python Scrapy Framework. The framework supports asynchronous web requests which will pass the 70000/hr requirement. Text me to discuss further.