
Open
Posted
•
Ends in 1 day
Paid on delivery
Project Title: Large-Scale Course Data Scraping from Udemy, Coursera, and YouTube Project Overview: I am looking for an experienced data scraping specialist who can collect a large dataset of online courses and tutorials from platforms such as Udemy, Coursera, and YouTube. The dataset should include both technical and non-technical tutorials. The objective is to build a structured dataset that can be used for research and analytics purposes. Project Requirements: The freelancer will scrape approximately 200,000 course/tutorial records from the following platforms: - Udemy - Coursera - YouTube The collected dataset should contain the following information for each record: 1. Course / Tutorial Title 2. Course URL / Video URL 3. Platform Name (Udemy / Coursera / YouTube) 4. Category (Technical or Non-Technical) 5. Paid or Free Indicator Technical Expectations: - Data scraping can be performed using tools such as Python, BeautifulSoup, Selenium, Scrapy, or other efficient scraping frameworks. - The freelancer must handle pagination, dynamic loading, and scrolling where required. - Duplicate entries should be avoided. - The final dataset should be clean and well-structured. Output Format: The final deliverable should be provided in one of the following formats: - CSV - Excel - JSON The dataset must contain approximately 200,000 unique entries. Timeline: The project should be completed within 2 days from the start date. Additional Notes: - Data accuracy and proper formatting are very important. - The freelancer should ensure that the scraping process is efficient and capable of handling large-scale data extraction. Skills Required: - Web Scraping - Python - Selenium / BeautifulSoup / Scrapy - Data Cleaning & Structuring - Handling Large Datasets
Project ID: 40375939
19 proposals
Open for bidding
Remote project
Active 8 hours ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
19 freelancers are bidding on average ₹1,120 INR for this job

Hi, Lets get connect over a chat. I have more than 9 years of experience in building custom platforms in python. I will walk through to my work samples as well. I am online right now. Thanks Ali
₹1,050 INR in 1 day
5.3
5.3

Hi there, Strong alignment with this project comes from experience handling large-scale data scraping and processing where accuracy, efficiency, and scalability are essential. Clear understanding of the requirement to extract structured course data from multiple platforms, handle dynamic content, and ensure clean, deduplicated datasets. Hands-on expertise with Python, Selenium, BeautifulSoup, and scalable scraping workflows ensures efficient data collection and well-structured outputs. Risk is minimized by implementing deduplication logic, validating datasets, and ensuring consistent formatting across all records. Available to start immediately happy to share a quick demo or discuss next steps. Recent work: https://www.freelancer.com/u/chiragardeshna Regards Chirag
₹2,000 INR in 2 days
4.4
4.4

I have reviewed the complete description of your job post and I'm confident I will meet your expectations and give you 100% satisfaction and quality work. I'm all set and available to start working on it right away. can begin work immediately and aim to have it completed within one day. I am pretty sure that I can provide you high quality work within short amount of time. Thank you, and I look forward to the opportunity. Best regards Jaweria
₹600 INR in 7 days
4.1
4.1

Udemy and Coursera both have JSON endpoints behind their course pages. Hitting those directly is faster and more stable than DOM scraping with Selenium, since the pages re-render often and break selectors. YouTube is the easier of the three if you only need metadata, the Data API covers most fields without scraping at all. I'd build it in Python with httpx for the API endpoints and Scrapy for anything that genuinely needs HTML parsing. Output to CSV or JSON, your call. - Working scraper for all three sources - Configurable categories and result limits - Short README on running it 2 days. What fields do you need per course?
₹1,200 INR in 2 days
2.8
2.8

As an AI-driven software solutions company, my team and I are proficient in all the skills required to get your project done. With years of experienced across web scraping, Python and several efficient scraping frameworks like BeautifulSoup, Selenium, and Scrapy, we're well positioned to not only handle your 200,000 course/tutorial records on platforms (Udemy/Coursera/YouTube) but also do so efficiently. We’ll ensure pagination, dynamic loading, scrolling are adeptly handled while preventing duplicate entries using proper data cleaning techniques. In addition to our technical expertise in scraping large datasets, we hold a strong tradition of delivering high-quality work ahead of schedule. This can be attested to by our expansive clientele who trust us with invaluable projects. We thrive in resolving complex challenges and the efficient extraction of research-focused structured data has been one of our strong suits. Data accuracy and formatting are key focuses for us. By engaging us in this project, you're ensuring not only a well-structured dataset in formats such as CSV, Excel or JSON but also a stress-free and timely delivery. Don't hesitate to contact us at Automivex; let's build not just software but a valuable asset from this dataset for you.
₹700 INR in 2 days
1.0
1.0

Hi, I can efficiently scrape and structure 200,000+ course records from Udemy, Coursera, and YouTube using Python (Scrapy/Selenium), ensuring clean, deduplicated, and well-formatted data. I’ll deliver a high-quality dataset (CSV/JSON/Excel) within 2 days with proper handling of dynamic content, pagination, and accuracy checks.
₹999 INR in 2 days
0.7
0.7

Hello, As an expert Full Stack Developer who specializes in web scraping and Python automation, I am confident in my ability to successfully complete your Large-Scale Course Data Scraping project. Having developed scalable SaaS platforms for companies such as Microsoft and Deloitte, I understand the delicate balance between structuring large datasets and maintaining data accuracy. With my 8+ years of experience, I not only apply advanced web scraping techniques using Python, BeautifulSoup, Scrapy, but also handle pagination, dynamic loading, and any other complexities that may arise. What differentiates me from others is my AI-focused approach which aligns well with your project's requirement of extracting both technical and non-technical courses. My successful implementations span from AI-powered automation pipelines to RAG-based systems aimed at enhancing accuracy while minimizing hallucination. For a dataset of this magnitude, I have sound knowledge in managing large datasets efficiently , right from extraction to cleaning & structuring , key skills to avoid duplicates. In the end, it's the quality and format of the final deliverable that truly adds value. My clients praise me for ensuring clean, well-structured output whether in CSV or Excel or JSON formats. Given my track record of designing systems that scale with real data and users, hiring me would guarantee a seamless transition of your project into a reliable solution rea Thanks!
₹933 INR in 4 days
0.0
0.0

Hi — your project is specifically about scraping the target website into structured output in CSV/Excel/JSON, and that is exactly the kind of extraction workflow I build. I’d handle pagination, login/dynamic content if needed, and add retry/error handling so the data is complete and clean. Relevant stack: Python, Selenium, BeautifulSoup, Scrapy. A quick question before I start: is this a one-time scrape or something you want to run repeatedly? I can start immediately and deliver clean results fast.
₹1,100 INR in 7 days
0.0
0.0

’ve reviewed your requirement for large-scale course data scraping from Udemy, Coursera, and YouTube. I have strong experience in Python-based scraping using Scrapy, Selenium, and BeautifulSoup, including handling dynamic content, pagination, and large datasets. However, I’d like to highlight that collecting ~200,000 clean and unique records across these platforms within 2 days requires a well-optimized scraping pipeline and proper handling of rate limits and anti-bot protections. Here’s how I can help: • Scalable scraping setup for large data extraction • Clean, structured dataset (CSV/Excel/JSON) • Duplicate removal & data validation • Efficient handling of dynamic pages and APIs (where possible) I can start immediately and deliver a high-quality dataset. We can also break the project into phases to ensure accuracy and performance. Let’s discuss the best approach to achieve your target efficiently. Best regards
₹1,050 INR in 7 days
0.0
0.0

Hi, I can deliver a reliable scraping workflow for your course data with anti-breakage handling (pagination, retries, schema checks, and clean export format). Milestone 1 (2 days): 1) source structure mapping 2) scraper build with retry + validation 3) normalized output (CSV/Excel/JSON) 4) quick runbook for repeat execution Share the source URLs + required fields and I’ll start with a pilot scrape. Best, Stoyan
₹1,250 INR in 2 days
0.0
0.0

Hi, I can help you build a structured dataset of 200,000+ courses from Udemy, Coursera, and YouTube with high accuracy and scalability. I will use efficient scraping methods (Python + APIs where possible) to extract course title, URL, platform, category (technical/non-technical), and paid/free status. The data will be cleaned, deduplicated, and organized in a well-structured CSV or Excel format, ready for analysis. I will also provide a sample dataset within 24–48 hours for your review before proceeding with full extraction. I ensure reliable delivery, clear communication, and quality-focused results. Looking forward to working with you. Best regards, Sumbul Naz
₹1,050 INR in 7 days
0.0
0.0

Large-scale course data scraping from Udemy, Coursera, and YouTube is something I've done before — and I know the quirks of each platform (dynamic loading, rate limits, anti-bot measures). My approach for 200,000 records: Udemy: Use their public REST API (course catalog endpoint) — much faster and cleaner than scraping HTML. Can pull 10k+ records/hour with pagination. Coursera: Combine their GraphQL API + search endpoint to extract course metadata efficiently. YouTube: Use the official YouTube Data API v3 for tutorial metadata (title, URL, category, free indicator) — reliable and scalable. For dynamic pages I'll use Scrapy + Selenium as fallback. All data will be deduplicated, cleaned, and structured. Deliverables: - Clean CSV / Excel / JSON with all 5 required fields (title, URL, platform, category, paid/free) - ~200,000 unique entries - Python scripts included so you can re-run anytime I can complete this within 2 days. Ready to start immediately — please confirm the preferred output format.
₹1,500 INR in 2 days
0.0
0.0

Hello, I am writing to express my strong interest in your large-scale data scraping project. With a background in Backend Development and Automation Testing, I specialize in building robust, efficient scrapers that handle dynamic content and large-scale data structuring. For this project, I will implement a high-performance solution to extract 200,000 unique records from Udemy, Coursera, and YouTube. My approach ensures data integrity while meeting your aggressive 48-hour timeline. My Technical Approach: Framework Selection: I will utilize Scrapy for its asynchronous speed on Coursera/Udemy and Selenium/Playwright for YouTube’s heavy dynamic scrolling requirements. Efficiency & Anti-Blocking: To hit the 200k mark quickly, I will implement concurrent requests and handle pagination via direct API endpoint inspection (where available) to bypass slow UI rendering. Data Quality: I will build a post-processing pipeline using Python (Pandas) to remove duplicates, categorize content (Technical vs. Non-Technical), and ensure the final CSV/JSON output is perfectly structured. Reliability: My experience in automation testing allows me to build "self-healing" scripts that can handle unexpected UI changes without crashing. I am ready to begin immediately and can provide a sample set of 1,000 records within the first few hours to confirm the data format meets your exact needs. Best regards, Marwan
₹1,050 INR in 4 days
0.0
0.0

❤️❤️❤️ Large-scale course data scraping, cleaned and delivered in a structured dataset. ❤️❤️❤️ ⫸ I’m a Python web-scraping specialist with experience building reliable, high-volume data pipelines for clean analytics-ready datasets. ◆ What I’ll deliver: → Scrape course/tutorial records from Udemy, Coursera, and YouTube with deduplication. → Classify entries by platform, category, and free/paid status. → Return a clean CSV/Excel/JSON dataset ready for research use. ◆ Deliverables: → Around 200,000 unique records. → Title, URL, platform, category, and paid/free fields for each entry. → Clean, structured output with duplicates removed. → Final delivery in CSV, Excel, or JSON. I’ll use an efficient scraping and cleaning workflow so the dataset is consistent, deduplicated, and usable for analytics. The project is large, but I can organize it into a focused extraction pipeline and validate the output as I go. For a dataset of this size, I would keep the process fast but carefully structured to avoid duplicates and formatting issues.
₹1,050 INR in 7 days
0.0
0.0

Hi! I carefully read your project: you need a clean, structured dataset (200,000 unique records) from Udemy, Coursera, and YouTube with these fields: Title, URL, Platform, Category (Technical/Non-Technical), Paid/Free, delivered as CSV/Excel/JSON, with duplicates removed and efficient large scale extraction. I can deliver this in 3 days using a reliable Python scraping pipeline. 1. First 6–12 hours: I’ll send you a sample dataset (5,000–10,000 records) in your chosen format so you can verify the structure and accuracy. 2. Next: full-scale extraction + cleaning + dedup. 3. Final delivery: one combined file (CSV/Excel/JSON) with ~200,000 unique entries + a short summary (counts per platform + duplicates removed). A few quick questions (to ensure exact output you want) 1. For YouTube, should I mark all entries as Free (since videos are free to access), or do you want a different rule? 2. How do you want me to classify Technical vs Non-Technical—do you have preferred categories/keywords, or should I apply a standard mapping based on keywords/topics? I’m ready to start immediately and will keep you updated during the run to ensure everything matches your requirements.
₹1,399 INR in 3 days
0.0
0.0

Hi, I can build an efficient scraping pipeline to collect ~200,000 course records from Udemy, Coursera, and YouTube with clean and structured output. I will use a hybrid approach with APIs where available and Scrapy/Selenium for dynamic content, ensuring fast extraction, proper pagination handling, and no duplicates. The dataset will include title, URL, platform, category, and paid/free classification, delivered in CSV/JSON/Excel. I can start immediately and deliver a clean, validated dataset within the timeline. Quick question: do you need this as a one-time dataset or a reusable scraping script as well?
₹1,200 INR in 2 days
0.0
0.0

Hey, I built scrapers — Python, Selenium, clean structured output. Fast delivery, no back and forth.
₹1,050 INR in 7 days
0.0
0.0

Pune, India
Member since Apr 16, 2026
£250-750 GBP
€8-30 EUR
$250-750 USD
₹600-1500 INR
$15-25 USD / hour
$30-250 CAD
₹1500-12500 INR
₹600-1500 INR
$3000-5000 USD
€750-1500 EUR
$15-25 USD / hour
$10-30 AUD
$30-250 USD
$250-750 USD
$30-250 USD
$8-20 USD / hour
₹37500-75000 INR
₹1500-12500 INR
₹12500-37500 INR
₹600-1500 INR