
Open
Posted
•
Ends in 2 days
I'm seeking an experienced programmer to create spatio-temporal diffusion transformers (ST-DiT) for video generation. The project involves: - Generating new videos from scratch - Enhancing existing videos - Transforming videos based on input prompts Requirements: - Proficiency in PyTorch or python - Experience with ST-DiT models -- some experience - Ability to assist in acquiring necessary datasets Ideal candidates should have a some background in machine learning, particularly in video generation and diffusion models.
Project ID: 40491291
45 proposals
Open for bidding
Remote project
Active 1 day ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
45 freelancers are bidding on average $34 USD/hour for this job

⭐⭐⭐⭐⭐ Create Spatio-Temporal Diffusion Transformers for Video Generation ❇️ Hi My Friend, I hope you are doing well. I've reviewed your project requirements and noticed you're looking for an experienced programmer to create spatio-temporal diffusion transformers (ST-DiT) for video generation. Look no further; Zohaib is here to help you! My team has completed 50+ similar projects in video generation and machine learning. I will generate new videos, enhance existing ones, and transform videos based on your input prompts. ➡️ Why Me? I can easily do your video generation project as I have 5 years of experience in Python and machine learning, focusing on video generation and diffusion models. My expertise includes working with PyTorch, dataset handling, and model implementation. Not only this, I have a strong grip on other relevant technologies, ensuring I can provide the best results for your project. ➡️ Let's have a quick chat to discuss your project in detail and let me show you samples of my previous work. Looking forward to discussing with you in chat. ➡️ Skills & Experience: ✅ Python Programming ✅ PyTorch ✅ Machine Learning ✅ Video Generation ✅ ST-DiT Models ✅ Data Acquisition ✅ Model Training ✅ Video Enhancement ✅ Dataset Management ✅ Prompt-based Transformation ✅ Algorithm Development ✅ Problem Solving Waiting for your response! Best Regards, Zohaib
$30 USD in 40 days
7.9
7.9

Hello, I understand you need a Python/PyTorch developer to build ST-DiT-based video generation workflows for creating videos from scratch, enhancing existing videos, and transforming videos from text prompts. I have experience with diffusion models, transformer architectures, PyTorch training pipelines, video preprocessing, model fine-tuning, dataset preparation, inference optimization, and research-to-prototype ML development. I will help select or prepare suitable datasets, set up the ST-DiT architecture, build reproducible training/inference scripts, support prompt-based video transformation, document the workflow, and leave clear hooks for future tuning. Q1: Do you want to train from scratch or fine-tune an existing video diffusion model? Q2: What target video resolution and duration should the first prototype support? Q3: Do you already have GPU infrastructure, or should I recommend a cloud setup? Best regards, Stratos
$38 USD in 40 days
7.2
7.2

Hello, With respect to your project, as an AI professional with a substantial background in machine learning and video generation, I can assure that I have the necessary skills to effectively complete the task at hand. My expertise in Python and PyTorch align perfectly with your needs for spatio-temporal video generation, and the ability to enhance existing videos parallelly reflects my experience with ST-DiT models. Having served a global clientele for years, I deeply appreciate the significance of data quality in deep learning models. In this regard, my team can aid you in acquiring any datasets necessary for your specific project, ensuring that you're backed by the best resources. In conclusion, what makes me stand out from others is not only my proficiency in AI, ML, and data science but also my dedication to delivering top-notch solutions on time. By choosing me for your project, you're selecting a reliable partner who will go above and beyond to provide you with precise, efficient, and scalable results. Decisions backed by data build successful businesses, and I'm here to empower you in that mission. Thanks!
$50 USD in 169 days
6.8
6.8

Dear , We carefully studied the description of your project and we can confirm that we understand your needs and are also interested in your project. Our team has the necessary resources to start your project as soon as possible and complete it in a very short time. We are 25 years in this business and our technical specialists have strong experience in Python, Video Services, C++ Programming, Video Editing, Data Science, Artificial Intelligence, Data Collection, AI Model Development, AI Research, AI Development and other technologies relevant to your project. Please, review our profile https://www.freelancer.com/u/tangramua where you can find detailed information about our company, our portfolio, and the client's recent reviews. Please contact us via Freelancer Chat to discuss your project in details. Best regards, Sales department Tangram Canada Inc.
$30 USD in 5 days
7.5
7.5

Hi there, ★★★ Python Expert ★★★ 8+ Years of Experience ★★★ I can create spatio-temporal diffusion transformers for video generation with including generating new videos, enhancing existing ones, and transforming videos based on input prompts. This will include: - Developing ST-DiT models for video generation - Enhancing video quality and features - Implementing transformation based on user prompts - Assisting in dataset acquisition for training My approach will involve utilizing PyTorch for model development, ensuring efficient training and testing processes, and applying machine learning techniques to optimize video generation. Ready to start once you provide access to necessary datasets and any specific requirements you have in mind. Thanks!
$38 USD in 40 days
6.4
6.4

With a strong background in machine learning and hands-on experience in video generation using diffusion models, I am well-equipped to develop spatio-temporal diffusion transformers (ST-DiT) tailored to your needs. My proficiency in PyTorch and Python, coupled with a proven track record in creating and enhancing video content, empowers me to deliver innovative solutions that align perfectly with your project requirements. I am committed to ensuring high-quality outcomes and can assist with dataset acquisition to ensure comprehensive results.
$38 USD in 40 days
5.5
5.5

Most failures in video diffusion projects come from treating temporal consistency as an afterthought: spatial quality can be good, but flicker, motion collapse, and prompt drift ruin outputs. Addressing spatio-temporal structure up front is the real problem here. I’d build a pipeline that combines a pretrained image diffusion backbone with a temporal transformer module (ST-DiT style) operating on latent frames, training with both reconstruction and temporal consistency losses. For enhancement and prompt-driven transforms, use classifier-free and cross-attention conditioning so the same model supports generation from scratch, inpainting-style edits, and guided edits of existing clips. Recommended stack: PyTorch + Hugging Face diffusers extensions, a latent VAE (or VQ-VAE) to work in latent space, transformer temporal blocks, CUDA + NCCL, optional DeepSpeed/FairScale for larger runs, FFmpeg for IO, and W&B for experiments. I can help source/curate WebVid/Kinetics/UCF-101 and prepare license-compliant crawls. Design will be modular so you can swap backbones, resume fine-tuning, or expose fast lower-res inference. I’ve built ProgramPro (AI-driven generation/adaptation SaaS) — core work was building robust model pipelines, online adaptation logic, and production training workflows using Python/Django and PyTorch. If that approach fits, I can draft a 1-week POC plan. Quick question: what target resolution, typical clip length, and available GPU(s) do you plan to use for training and inference?
$37.50 USD in 7 days
4.8
4.8

Lets chat, a free consultation and no obligation. I understand you need a clean, professional, and user-friendly solution for your "Spatio-Temporal Video Generation Expert" project. My skills in PHP, Java, JavaScript are a perfect fit for this project. While I am new to freelancer.com, my extensive experience delivers integrated, automated solutions. Regards, Jason McLachlan
$25 USD in 3 days
4.6
4.6

Hi, Your project on developing spatio-temporal diffusion transformers (ST-DiT) for video generation is truly fascinating and right in my wheelhouse. With hands-on experience in PyTorch and python, alongside practical engagement with diffusion-based models, I’m confident in my ability to help generate, enhance, and creatively transform videos as per your prompts. Moreover, I have experience assisting in dataset acquisition that ensures robust model training. I propose starting with a clear roadmap, including dataset sourcing and preliminary model setup, aiming to deliver initial results within a week. This timeline will allow us to iterate and refine efficiently. Could you share more about the types of videos or datasets you want to prioritize for generation and enhancement? Best regards,
$25 USD in 21 days
4.3
4.3

As an experienced web developer with a strong background in artificial intelligence and Python, I offer the perfect skill set for your spatio-temporal video generation project. My proficiency in PyTorch and a range of other technologies gives me the edge to create and enhance videos using ST-DiT models. Although my direct experience in ST-DIT models is moderate, I am confident in my ability to pick up new concepts and contribute fully to the project. My approach merges simplicity, efficiency, and maintainability. I ensure high-performance output without compromising quality or reliability - attributes that align with your project's needs. Additionally, my experience working with automation tools such as N8N and Make can prove helpful in acquiring datasets or streamlining any aspect of the project. Choosing me for this job would not just be selecting a coder but an adaptable problem-solver eager to grasp complexity and transform it into valuable outcomes. Let's partner together to generate astounding videos by bringing the best of spatio-temporal diffusion transformers.
$38 USD in 40 days
4.0
4.0

Hello!, This is James from Hollywood, and I’m excited about your project on spatio-temporal diffusion transformers (ST-DiT). I’ve carefully read the description and have a solid grasp of what you’re looking for. With over 15 years of experience in AI development, Python, and video services, I’m confident I can deliver high-quality results tailored to your needs. To ensure I fully understand your requirements, could you please clarify the following questions? 1. What specific features or outcomes do you envision for the ST-DiT model? 2. Are there any existing datasets or video sources you’d like to use for training the model? My approach will include analyzing your requirements, designing the model architecture, and implementing the necessary algorithms to achieve your project goals. I focus on clear communication and structured milestones to ensure we stay aligned throughout the process. I’ve worked on various projects that involved AI model development and video processing, like a custom video analysis tool and a data-driven video summarization app, which have both successfully enhanced user engagement. I look forward to the opportunity to discuss your project further and explore how I can contribute to its success. Let’s chat!
$50 USD in 10 days
3.4
3.4

Hi there, This project instantly caught my eye, so I had to reach out. The idea of creating spatio-temporal diffusion transformers for video generation sounds fascinating. I noticed you're looking for someone proficient in PyTorch or Python with experience in ST-DiT models. I specialize in both and have worked on similar projects before. I believe my skills in machine learning, especially in video generation, align perfectly with what you need. I'm committed to fast communication and a quick turnaround. Let me know if you are available for a quick chat! Regards, XRProConnect
$25 USD in 7 days
3.3
3.3

Hi, I am excited about the opportunity to assist in developing spatio-temporal diffusion transformers (ST-DiT) for video generation. With a solid background in machine learning and experience in video generation, I understand the complexities involved in generating new videos, enhancing existing footage, and transforming videos based on specific prompts. My proficiency in Python and PyTorch aligns well with your project's requirements. I have previously worked with ST-DiT models, developing and fine-tuning them for various applications. I am also adept at gathering and preparing datasets, ensuring that the generated content is robust and relevant. I suggest a structured approach where we outline the specific goals for the generated videos. This will guide the development process efficiently. I can start a simple demo or a prototype of one of the components within 12 hours of commencement to provide you with early insights into the project's direction. I am available to communicate in real-time according to your time zone to ensure seamless collaboration. Q1: What specific features do you envision for the spatio-temporal models? Q2: Do you have particular datasets in mind, or would you need assistance in sourcing them? Q3: Are there specific input prompts you want the system to focus on initially? Looking forward to your feedback on my approach! Best regards, Cindy Viorina
$25 USD in 37 days
2.2
2.2

Rahul here, I can help develop and fine-tune ST-DiT-based video generation pipelines for text-to-video generation, video enhancement, and prompt-driven video transformation using PyTorch. I have experience working with deep learning, diffusion models, computer vision, model training pipelines, dataset preparation, and GPU-accelerated workflows. I can assist with model architecture implementation, dataset acquisition and preprocessing, training/inference optimization, evaluation, and deployment strategies. The focus will be on building a reproducible framework that supports future experimentation, fine-tuning, and scalability while delivering high-quality video generation results. I’m ready to discuss the target use case, available compute resources, preferred ST-DiT architecture, and dataset requirements.
$25 USD in 40 days
0.0
0.0

Hi, I’d be glad to help build an ST-DiT-based video generation system for creating new videos, enhancing existing footage, and transforming videos from text prompts. I have experience with PyTorch, diffusion models, transformer-based generative architectures, video preprocessing, and scalable training/inference workflows. For your project, I can help adapt an existing ST-DiT-style architecture or develop a custom pipeline with dataset acquisition, data cleaning, model training, evaluation, and deployment-ready inference. I can also support prompt conditioning, video-to-video transformation, temporal consistency improvements, and quality enhancement modules. To make the model effective, I’d first want to understand your target video style, resolution, and whether you prefer fine-tuning an existing model or training more heavily from curated data. Do you already have any sample videos or target examples that show the quality and motion style you want? I’m confident I can help turn this into a practical video generation system with a strong technical foundation. Best regards!
$38 USD in 40 days
0.0
0.0

DeKalb, United States
Payment method verified
Member since Oct 23, 2022
$250-750 USD
$3000-5000 USD
₹100-400 INR / hour
$750-1500 USD
$250-750 USD
$30-250 CAD
$2-8 AUD / hour
$30-250 CAD
$250-750 USD
₹600-1500 INR
₹12500-37500 INR
₹100-400 INR / hour
$10-30 USD
$2-8 USD / hour
€5000-10000 EUR
₹12500-37500 INR
$15-25 USD / hour
$10-30 USD
₹750-1250 INR / hour
₹1500-12500 INR
$10-40 USD
$750-1500 AUD