Enhance Universal Scrapy Project via Scrapy_Playwright
$30-250 USD
Closed
Posted 4 months ago
$30-250 USD
Paid on delivery
Overview: We seek an experienced developer proficient in Scrapy and the scrapy_playwright API to enhance our universal web scraper, which collects critical text content from diverse URLs. Improving our scraper's reliability is essential to our business, enabling timely data-driven decisions. This role offers continuous opportunities for collaboration and ongoing projects.
Job Description: Each day, we scrape around 10,000 websites, with approximately 900 failing due to various technical challenges. We've documented these unsuccessful scrapes with relevant metadata in Excel spreadsheets. Your primary task is to resolve these failures using Scrapy and Scrapy-Playwright, significantly improving our scraping accuracy and consistency.
Some examples of quick fixes include:
• Adding regex expressions (e.g., r'\d{1,2} \w+ \d{4}' # DD Month Year) to enhance date extraction.
• Utilizing domain-specific US-based proxies to bypass 403 status codes
(e.g.
# [login to view URL]
`domain = urlparse(url).netloc`
…
`if domain_location == “US”:
`[login to view URL]['proxy'] = US_proxy`
).
Skills and Qualifications:
1. Fluent in English with strong interpersonal skills
2. Experience scraping diverse URLs simultaneously.
3. Proficiency in Python, Scrapy Framework, and Scrapy-Playwright API.
4. Experience in handling web scraping challenges (CAPTCHA, rate limiting, proxies etc.)
5. Highly responsive and timely communicator.
6. Experience with [login to view URL] is advantageous
How to Apply?
To apply, review the attached Task Outline Document and provide a brief, high-level explanation (dot points are acceptable) describing how you would approach each task. Your application will be reviewed promptly, and if shortlisted, we'll contact you to arrange a brief, casual interview with our Lead Developer. During this session, you'll walk through the tasks together, ensuring alignment on expectations.
Why work with us?
We're a dynamic, ambitious start-up on a mission to achieve exponential growth, which means endless opportunities lie ahead if you're the right fit. We deeply value personal and professional growth, providing an environment where your ideas and contributions truly matter. While we're dedicated and hardworking, we also embrace a relaxed, collaborative approach that encourages creativity, innovation, and balance. Join us, and become part of a close-knit team committed to making an impact while enjoying the journey.
⭐⭐⭐⭐⭐
Dear client,
I am interested in enhancing your Universal Scrapy project with JavaScript execution and browser automation.
I am an experienced Python developer specializing in web scraping, Scrapy framework, and Selenium/Playwright integration. I can optimize your scraping pipeline for dynamic content, ensuring efficient and accurate data extraction.
I’d love to contribute my expertise to your project. Let’s discuss the details and get started!
⭐⭐⭐⭐⭐
Thank you,
Yaroslave