
Đã đóng
Đã đăng vào
Thanh toán khi bàn giao
Title Build Comprehensive Global Movie & TV Metadata Database for Recommendation Engine Project Overview I am building a personal recommendation engine that predicts my rating for movies and TV shows based on a large history of titles I have already rated. To support accurate predictions, I need a global media metadata database that contains rich structured information for movies and TV series. The dataset should combine multiple trusted sources and be designed for machine-learning comparison against my rating history. Scope of Work Build a master dataset containing global movie and TV metadata. This will serve as the candidate pool for prediction models. The database should include titles from: • IMDb official dataset • The Movie Database API • Optional enrichment from JustWatch Required Data Fields Each title should include as many of the following as possible. Core identification • imdb_id • tmdb_id • title • original_title • year • type (movie / series / episode / miniseries) Basic metadata • genres • runtime • language • country • release date Creative team • director(s) • writer(s) • top cast (first 10 billed) Ratings and popularity • IMDb rating • IMDb vote count • TMDb rating • TMDb vote count Narrative metadata • plot summary • keywords • themes/tags if available Structural attributes • franchise/series linkage • episode relationships for TV • sequel/prequel relationships Production information • production companies • budget • revenue (if available) • streaming availability (JustWatch) Deliverables 1. Merged master dataset • CSV or PostgreSQL database 2. Schema documentation 3. Data cleaning • remove duplicates • normalize titles • unify IDs 4. ETL pipeline • scripts to refresh the dataset monthly 5. Matching keys • imdb_id • tmdb_id Database Size Expectations Movies: 600k + TV Series 200k + Technical Requirements Preferred stack: • Python • Pandas • PostgreSQL or SQLite • API integration • ETL scripting Important Constraints Do NOT scrape IMDb pages. Use official datasets and APIs only. Goal of the Project The database will be used to: • compare against a personal movie rating history • calculate similarity between titles • generate predicted ratings • identify highly compatible unseen movies Accuracy of metadata is critical. Ideal Candidate Experience with: • media datasets • ETL pipelines • Python data engineering • IMDb or TMDb APIs • building large datasets
Mã dự án: 40284087
71 đề xuất
Dự án từ xa
Hoạt động 2 ngày trước
Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
71 freelancer chào giá trung bình $303 USD cho công việc này

Hello, Thank you so much for posting this opportunity. It sounds like a great fit, and I’d love to be part of it! I’ve worked on similar projects before, and I’m confident I can bring real value to your project. I’m passionate about what I do and always aim to deliver work that’s not only high-quality but also makes things easier and smoother for my clients. Feel free to take a quick look at my profile to see some of the work I’ve done in the past. If it feels like a good match, I’d be happy to chat further about your project and how I can help bring it to life. I’m available to get started right away and will give this project my full attention from day one. Let’s connect and see how we can make this a success together! Looking forward to hearing from you soon. With Regards!
$350 USD trong 7 ngày
7,0
7,0

Hi there yeah I've read the project description and I'm sure that I can do this for sure I'm expertise in python and I can do this for sure Kindly send me a message we'll discuss further Really looking forward to hear you Thank you
$275 USD trong 2 ngày
6,1
6,1

Hello, I’m excited about the opportunity to contribute to your project. With my expertise in Python, Pandas, PostgreSQL, API integration, and ETL pipeline development, along with a strong focus on clean, scalable implementation, I can deliver a solution that aligns perfectly with your goals. I’ll tailor the work to your exact requirements, ensuring accurate integration of official IMDb and TMDb data, clean normalization and ID matching, a well-structured master dataset for movies and TV, and a reliable refresh pipeline that supports your recommendation engine long term. You can expect clear communication, fast turnaround, and a high-quality result that fits seamlessly into your existing workflow. Best regards, Juan
$275 USD trong 3 ngày
5,9
5,9

I am excited about the opportunity to build your comprehensive global movie and TV metadata database for your recommendation engine. With my extensive experience in Python data engineering and deep familiarity with media datasets, including the IMDb and TMDb APIs, I can deliver a reliable, well-structured database tailored to your requirements. My background in developing ETL pipelines ensures that your dataset will be efficiently updated while maintaining high accuracy and integrity. I look forward to collaborating with you to create a solution that enhances your prediction capabilities.
$275 USD trong 7 ngày
5,6
5,6

I’m a full-stack software engineer with expertise in React, Node.js, Python, and cloud architectures, delivering scalable web and mobile applications that are secure, performant, and visually refined. I also specialize in AI integrations, chatbots, and workflow automations using OpenAI, LangChain, Pinecone, n8n, and Zapier, helping businesses build intelligent, future-ready solutions. I focus on creating clean, maintainable code that bridges backend logic with elegant frontend experiences. I’d love to help bring your project to life with a solution that works beautifully and thinks smartly. To review my samples and achievements, please visit:https://www.freelancer.com/u/GameOfWords Let’s bring your vision to life—connect with me today, and I’ll deliver a solution that works flawlessly and exceeds expectations.
$275 USD trong 7 ngày
5,2
5,2

I've built ETL pipelines pulling from TMDb and IMDb datasets before, so this setup is pretty familiar. I can build you a clean Python pipeline that pulls from the official IMDb dataset files + TMDb API, merges them on imdb_id/tmdb_id, normalizes everything, and loads into PostgreSQL with 600k+ movies and 200k+ TV series covered. I'll handle deduplication, schema design, and a monthly refresh script as part of the deliverables. The data fields you listed (cast, crew, ratings, plot, streaming via JustWatch) are all doable - JustWatch would be read-only from their API. Happy to discuss the schema and ETL approach before starting. - Usama
$325 USD trong 7 ngày
5,2
5,2

Hello, With over 7 years of experience in Web Scraping, Python, and Data Mining, I have the skills necessary to tackle your project effectively. I have carefully reviewed the requirements for building a comprehensive global movie and TV metadata database for your recommendation engine. To achieve this, I propose to combine data from trusted sources such as IMDb, The Movie Database API, and potentially JustWatch for enrichment. The master dataset will include core identification details, basic metadata, creative team information, ratings, narrative metadata, structural attributes, and production information for both movies and TV series. I will deliver a merged master dataset in CSV or PostgreSQL format, along with schema documentation, data cleaning, an ETL pipeline for monthly dataset refresh, and matching keys for imdb_id and tmdb_id. The technical stack I plan to use includes Python, Pandas, and PostgreSQL or SQLite for database management. For further discussion on how we can proceed with this project, please connect with me via chat. You can visit my profile at https://www.freelancer.com/u/HiraMahmood4072. Thank you.
$260 USD trong 2 ngày
4,9
4,9

Hello, there! My mastery in Python, Pandas, PostgreSQL and API integration complemented by my extensive involvement in building large datasets, uniquely positions me to contend with the scale of this project. Understanding that accuracy matters greatly in this endeavor, I vow to obtain a deep understanding of your personal movie rating history and curate the database to maximize similarity calculation between titles and generate highly accurate predicted ratings. This will be backed by my expertise in ETL pipelines to guarantee effective data cleaning including removal of duplicates, normalization of titles and unification of IDs without scraping IMDb pages, but strictly with official datasets and APIs. Furthermore, as a bonus value-addition, I can create an ETL script ensuring monthly refreshment of the dataset keeping it up-to-date indefinitely. My skills and personality make me a strong fit not only technologically but also professionally in terms of team collaboration. Together we can build a database that doesn’t just become useful to you in movie recommendation but it becomes a valuable asset for the global entertainment community.
$200 USD trong 7 ngày
4,6
4,6

Hello, I have a few quires regarding the personal media recommendation engine. 1) Do you have the API keys ready for TMDb and JustWatch? 2) Would you prefer the final master dataset in PostgreSQL or a local SQLite file? 3) Are there specific machine learning features, like text embeddings for plots, you want me to pre-calculate? I will build a comprehensive ETL pipeline using Python and a standard data manipulation library to aggregate metadata from official datasets and APIs. I will design a centralized SQL database to store the merged records, ensuring all IDs from different sources are unified and duplicates are removed. The pipeline will include normalization steps for titles and genres to create a clean candidate pool for your machine learning model. I will also implement a monthly refresh script to keep the production information and ratings up to date. This structured dataset will provide the rich metadata needed to compare against your rating history for accurate predictions. Thanks, Bharat
$200 USD trong 10 ngày
4,9
4,9

Hi! I specialize in building large-scale media metadata databases with 9+ years of experience in Python, ETL pipelines, and PostgreSQL, tailored for recommendation engines. Here's how I can help: * Aggregate and merge data from official IMDb and TMDb datasets, with optional enrichment from JustWatch * Build a clean, structured master database including titles, creative teams, ratings, genres, plot summaries, relationships, and production info * Implement data cleaning: remove duplicates, normalize titles, unify IDs, and ensure high accuracy for machine-learning comparison * Deliver CSV or PostgreSQL database with schema documentation and an ETL pipeline for monthly updates * Include matching keys (imdb_id, tmdb_id) for seamless integration with your personal rating history Could you clarify if you want streaming availability updates included for all titles or only select regions?
$275 USD trong 7 ngày
4,4
4,4

Drawing from my extensive range of skills and experience, I would be the ideal candidate for your project. With a deep understanding of Python, Pandas, and PostgreSQL – all preferred for this project – and having built and maintained ETL pipelines, I can wrangle your large media datasets with ease. I strictly adhere to using official datasets and APIs only, like IMDb or TMDb APIs, so you can rest assured knowing I'll operate within your constraints. Not only do I possess technical expertise in Python data engineering, but I also have an innate appreciation for film and television. As a movie buff myself, I recognize the value in crafting a comprehensive database like the one you envision for your personal recommendation engine. Whether it's ensuring data accuracy by incorporating sophisticated matching keys for imdb_id & tmdb_id or identifying highly compatible unseen movies through a nuanced understanding of metadata, no detail will be left unexamined.
$250 USD trong 7 ngày
4,7
4,7

Hello, This is a very interesting project and I’d be happy to help build the dataset for your recommendation engine. I have experience with Python data engineering, ETL pipelines, API integrations, and large datasets. I can create a clean, unified movie & TV metadata database by combining the IMDb official dataset, TMDb API, and optional JustWatch enrichment, while respecting the constraint of not scraping IMDb pages. What I’ll deliver: Merged master dataset CSV or PostgreSQL Cleaned and normalized records with deduplicated titles and unified IDs Full metadata structure including cast, ratings, genres, relationships, and production data Documented schema Automated Python ETL pipeline to refresh the dataset monthly The final dataset will be structured and optimized for machine learning similarity analysis and rating prediction. I’d be glad to discuss the preferred database format and update schedule before starting.
$255 USD trong 3 ngày
4,3
4,3

Hi there, I am PhD in AI now. I did many project in data scrapping. How many sample you need, 10k? Text me for more detail discuss. I can give you the data and source code and guide you to run at anytime you need for new data for testing you model.
$200 USD trong 4 ngày
3,7
3,7

Hello, I understand your needs. I am an expert with 8 years of experience in Python, Web Scraping, Java and I helped many clients reach their goals. Feel free to visit my profile to check latest work and feedback from clients. Let us make this great together, please connect in chat. Thank you, Bwalya
$280 USD trong 7 ngày
3,4
3,4

Hello, I’ve gone through your project details and this is something I can definitely help you with. With 10+ years of experience in data engineering and extensive work with media datasets, I can build a comprehensive global movie and TV metadata database tailored for your recommendation engine. My expertise in Python, ETL scripting, and APIs such as IMDb and TMDb will ensure the dataset is rich in structure and accurate. I'll focus on delivering a merged master dataset with clean data and robust documentation to support your project goals. Here is my portfolio: https://www.freelancer.in/u/ixorawebmob I’m interested in your project and would love to understand more details to ensure the best approach. Could you clarify: 1. Do you need any specific formats for the dataset, apart from CSV or PostgreSQL? 2. Are there any particular timelines you have in mind for completion? 3. Will you require regular updates beyond the monthly ETL pipeline? Let’s discuss over chat! Could you clarify the specific formats you need for the dataset, apart from CSV or PostgreSQL? Regards, Arpit Jaiswal
$305 USD trong 1 ngày
3,3
3,3

Hi there, Your "Movie & TV Metadata Master Database" project for a recommendation engine sounds incredibly exciting and I'm keen to help! I fully grasp the need for a comprehensive, accurate, multi-source dataset (IMDb, TMDb, JustWatch) encompassing all specified fields, from core IDs to streaming availability, all structured for robust ML comparison. My strong Python (Django) and backend development expertise equips me perfectly to build the robust ETL pipeline, integrate necessary APIs, perform data cleaning (duplicates, normalization), and deliver a pristine PostgreSQL/CSV master dataset. I'm adept at data modeling and handling large datasets, ensuring the accuracy and structure critical for your prediction engine. I can also set up maintainable scripts for monthly refreshes, keeping your data current. Let's connect to discuss how I can deliver this foundational database to power your recommendation system. Regards, Nikhil Chandra Roy
$275 USD trong 7 ngày
3,6
3,6

You need a global media metadata database with rich structured information for movies and TV series to support your personal recommendation engine. I've built a similar dataset for movie ratings before, combining data from multiple sources. I'll use Python and APIs to quickly gather data from IMDb, The Movie Database, and optionally JustWatch. I can start right now and deliver fast. Let's chat!
$215 USD trong 7 ngày
3,2
3,2

You’re looking to build a comprehensive global movie and TV metadata database that merges IMDb official datasets, TMDb API data, and optionally JustWatch information to support your personal recommendation engine. I understand the need for accurate, clean, and richly structured data covering creative teams, ratings, narrative metadata, and production details, all unified with consistent IDs and refreshed monthly through an ETL pipeline. With over 15 years of experience and more than 200 projects completed, I specialize in Python-based data engineering, API integration, PostgreSQL database design, and ETL workflows. I have particular expertise working with media datasets and APIs like IMDb and TMDb, ensuring data quality and scalability for large datasets like the 600k+ movies and 200k+ TV series you require. I will develop a Python ETL pipeline using Pandas to aggregate and clean data from your specified sources, removing duplicates and normalizing titles while maintaining key identifiers like imdb_id and tmdb_id. The final merged dataset will be delivered in PostgreSQL with detailed schema documentation, and the pipeline will be designed for monthly updates. A realistic timeline for this scope is around 3–4 weeks to ensure accuracy and completeness. Let’s discuss your project in detail so I can tailor the solution exactly to your needs.
$220 USD trong 7 ngày
2,8
2,8

Hi, there, As an experienced freelance engineer with expertise in media datasets and data engineering, I am excited about the opportunity to work on the 'Movie & TV Metadata Master Database' project. Leveraging my skills in ETL pipelines, Python data engineering, and API integration, I am confident in delivering a comprehensive global movie and TV metadata database for your recommendation engine. ✅ Creating a Master Dataset: I will compile a rich dataset by combining information from IMDb's official dataset, The Movie Database API, and optionally enriching it with data from JustWatch. ✅ Data Fields and Metadata: I will ensure the database includes core identification details, basic metadata, creative team information, ratings, narrative metadata, structural attributes, and production information to enhance the accuracy of predictions. ✅ Data Cleaning and ETL Pipeline: To maintain data integrity, I will clean the dataset, remove duplicates, normalize titles, and provide scripts for monthly dataset refreshment. ✅ Database Deliverables: I will deliver a merged master dataset in CSV or PostgreSQL format, along with schema documentation and matching keys for imdb_id and tmdb_id. ✅ Technical Stack: Utilizing Python, Pandas, and PostgreSQL, I will seamlessly integrate APIs and develop ETL pipelines to meet your technical requirements. I look forward to working with you. Best Regards. Brayan
$305 USD trong 1 ngày
2,6
2,6

Hi, I will build a comprehensive global movie and TV metadata database tailored for your recommendation engine. With extensive experience in media datasets and ETL pipelines, I'm well-equipped to integrate data from IMDb and TMDb while ensuring accuracy and consistency. I’ll create a merged master dataset that encompasses all required fields, including core identification, metadata, and production information. Using Python and PostgreSQL, I'll implement a robust ETL pipeline to refresh the dataset monthly and ensure data integrity through cleaning and normalization processes. To optimize the database for your predictive model, I’ll focus on establishing reliable matching keys and relationships within the data. This will support accurate comparisons against your personal rating history. What’s your timeline for this project? Are there specific metrics you want to prioritize in the dataset? Thank you.
$275 USD trong 7 ngày
2,7
2,7

Tempe, United States
Phương thức thanh toán đã xác thực
Thành viên từ thg 3 8, 2026
$10-30 AUD
$750-1500 USD
$30-250 USD
₹1500-12500 INR
₹1500-12500 INR
$15-25 USD/ giờ
₹1250-2500 INR/ giờ
$5000-10000 USD
€750-1500 EUR
$250-750 USD
€250-750 EUR
₹1500-12500 INR
£20-250 GBP
$750-1500 USD
$750-1500 USD
$30-250 USD
$10-30 USD
$30-250 USD
₹600-10000 INR
$250-750 USD