
Đã đóng
Đã đăng vào
Thanh toán khi bàn giao
I need a researcher who can build a production-ready model that listens to a baby’s cry, watches the paired video, and decides—reliably—whether the cause is hunger, discomfort, or simple attention seeking. Audio and video must be fused inside one architecture; running them in parallel but independently will not satisfy our accuracy goals. You may use the deep-learning stack you trust most (PyTorch, TensorFlow, Keras, OpenCV, torchaudio, etc.) provided the final network can run in real time on an edge device and be exported to ONNX or TFLite. I will share product constraints and a small proprietary data set; you will expand it through public sources or augmentation, perform rigorous cross-validation, and refine the model until we consistently exceed 90 % precision and recall on an unseen hold-out set. When you apply, show me past work—links to papers, GitHub repos, Kaggle solutions, or shipped features—demonstrating experience with cry detection, sound-event recognition, emotion analysis, or any other multimodal perception problem. A concise paragraph with links is enough; no full proposal is needed at this stage. Deliverables • Well-documented training pipeline and source code • Trained model file(s) plus lightweight export (ONNX/TFLite) • Inference script or microservice, ready for product integration • Evaluation report: confusion matrix, per-class metrics, brief methodology • Integration guide detailing inputs, outputs, and runtime footprint Payment is released as soon as the artefacts are reviewed and meet the stated accuracy target.
Mã dự án: 40224725
14 đề xuất
Dự án từ xa
Hoạt động 25 ngày trước
Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
14 freelancer chào giá trung bình ₹24.779 INR cho công việc này

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
₹25.000 INR trong 7 ngày
7,3
7,3

As a data scientist with a specialty in deep learning, I believe I'm the perfect fit for your project. My industry experience includes work on complex multimodal perception problems, similar to your cry classification task. In fact, I’ve successfully tackled cry detection, sound-event recognition, and emotion analysis. Additionally, my proficiency with neural networks, especially CNNs and RNNs, will be instrumental in fusing and training audio and video inputs within a single architecture. I'm well-versed in various deep-learning stacks like PyTorch and TensorFlow—arguably some of the best tools for this job. What's more, I am experienced in deploying models to edge devices and exporting them to formats like ONNX or TFLite. This offers two crucial benefits for your project: real-time processing on-demand and easy integration into your final product. Furthermore, having worked within biomedical data science domains such as CT scans and cancer detection, I'm familiar with handling sensitive data. You can trust that your proprietary dataset will remain secure but also effectively augmented through comprehensive public sources. My training pipeline is robust and well-documented while my evaluation reports provide precise insights through confusion matrices and detailed per-class metrics—perfect for meeting your stringent project demands. Allow me to exceed your expectations with my expertise!
₹25.000 INR trong 7 ngày
6,1
6,1

I have hands-on experience building multimodal audio-visual classifiers for edge deployment, including sound-event recognition and behavior-state inference pipelines where audio and video are fused in a single network (not late independent voting). I typically use PyTorch + torchaudio + OpenCV, with temporal fusion via cross-modal attention or lightweight transformer heads, then optimize for edge inference and export to ONNX/TFLite with INT8 quantization when needed. Relevant work includes: (1) real-time acoustic event detection pipelines with noisy-field robustness, (2) video+audio behavior classification prototypes with synchronized clip sampling and temporal augmentation, (3) production inference services packaged for low-latency deployment. I can share code samples and implementation structure (training/eval/export/inference) aligned to your deliverables: documented training pipeline, trained checkpoints, ONNX/TFLite exports, integration-ready inference service, and full evaluation reporting (confusion matrix + per-class precision/recall + methodology). My workflow is metric-driven with strict hold-out validation and cross-validation to meet target reliability before handoff.
₹25.000 INR trong 5 ngày
5,7
5,7

Hi, I’m an Applied ML Engineer specialised in building edge-ready multimodal models and ships them as ONNX/TFLite . My approach for your project would be (true fusion): Data + labeling: segment synced AV windows (eg: 2s / 0.5s stride), enforce subject-wise splits (no baby leakage), and unify targets as Cry/No-cry + {No pain, Moderate, Pain} Relabel open audio: pretrain audio encoder on public cry-reason datasets, then map into your taxonomy via a defensible mapping table + weak-supervision confidence (pain -> Pain, discomfort ->Moderate, non-cry->No-cry) to boost robustness Fusion architecture: lightweight audio CRNN/DS-CNN + tiny video backbone + mid-level fusion (cross-attention / gated fusion) -> shared embedding multitask heads (cry + pain) Generalization + metrics: heavy real world aug (noise, low-light/blur), rigorous CV, confusion matrix + per-class P/R/F1, and an uncertain threshold option to keep precision high. Edge deployment: export ONNX/TFLite, INT8 quantization (QAT/PTQ), latency profiling, and a clean inference microservice/API. Relevant shipped work: On-device audio event detector : streaming log-mel pipeline, smoothing/cooldown logic, INT8 quantization, real time mobile inference. Multimodal perception for safety monitoring: audio+video temporal fusion with gating for noisy/low-visibility conditions, subject-independent evaluation. Edge CV pipeline: optimized detection/classification with ONNX Runtime/TensorRT-style speedups and reproducible training+export.
₹27.500 INR trong 7 ngày
4,1
4,1

Hi I can develop a multimodal deep-learning model that fuses baby cry audio and paired video within a single architecture, optimised for real-time edge deployment and exportable to ONNX or TFLite. The solution will include a documented training pipeline, cross-validated evaluation exceeding your precision/recall targets, trained model files, inference script, and a clear integration guide for product use. Please let me know further. Thanks
₹25.000 INR trong 10 ngày
3,6
3,6

With my extensive 8+ years of experience in Data Analytics and Science, I bring to the table a unique blend of skills for your High-Accuracy Multimodal Cry Classifier project. Not only do I have an excellent understanding of Python and its various scientific libraries, including TensorFlow and PyTorch, but I also possess deep expertise in handling complex datasets and building predictive analytics models from scratch. This aligns perfectly with your needs for creating a production-ready model that integrates audio and video through a streamlined deep-learning architecture. I understand the importance of accuracy goals in this project, especially when it involves classifying different emotions based on sound and video cues with high precision. To demonstrate my capabilities, I encourage you to explore my GitHub repository (link here) where you can see my previous work on sound-event recognition and emotion analysis – two domains that closely relate to your project at hand.
₹20.000 INR trong 5 ngày
3,3
3,3

I’m a senior AI/ML engineer with experience building production-ready multimodal systems combining audio and video for real-time inference. I’ve developed and deployed streaming voice bots (LLM + ASR + TTS), sound-event detection pipelines using torchaudio and PyTorch, and computer-vision systems using OpenCV and deep CNN/Transformer architectures. My work includes real-time inference optimization, ONNX export, edge deployment, and robust evaluation with cross-validation and per-class metrics. I’ve built end-to-end training pipelines with augmentation, modality fusion (early/late/cross-attention), and deployment-ready microservices. Relevant work samples and repositories can be shared upon request.
₹25.000 INR trong 7 ngày
3,0
3,0

With 7 years of experience in the field, I am the best fit to complete this project. I have the relevant skills and have worked on similar solutions in the past. How I will complete this project: - I will build a production-ready model that listens to a baby’s cry, watches the paired video, and reliably classifies the cause. - I will fuse audio and video inside one architecture to achieve high accuracy goals. - The final network will run in real-time on an edge device and be exported to ONNX or TFLite format. Tech stack I will use: - I will utilize deep-learning frameworks such as PyTorch, TensorFlow, Keras, OpenCV, and torchaudio. - I will expand the proprietary dataset through public sources or augmentation. - Rigorous cross-validation will be performed to refine the model until it consistently exceeds 90% precision and recall on an unseen hold-out set. Deliverables: - Well-documented training pipeline and source code. - Trained model file(s) with lightweight export to ONNX/TFLite. - Inference script or microservice ready for product integration. - Evaluation report including confusion matrix, per-class metrics, and brief methodology. - Integration guide detailing inputs, outputs, and runtime footprint. I have the expertise and experience to deliver a high-accuracy multimodal cry classifier that meets your requirements.
₹13.750 INR trong 7 ngày
1,2
1,2

Hello Deepak S., I checked your project, and it looks interesting. This is something we already work on, so the requirements are clear from the start. We mainly work on Python, Data Processing, Algorithm, Data Science, Keras, Computer Vision, Deep Learning, Natural Language Processing We focus on making things simple, reliable, and actually useful in real life not overcomplicated stuff. Let’s connect in chat and see if we’re a good fit for this. Best Regards, Ali nawaz
₹50.000 INR trong 8 ngày
0,0
0,0

I am an excellent fit for your project, having successfully completed similar work in the past. I understand you need a seamless, integrated audio-video model to classify baby cries by cause with over 90 percent precision and recall, running efficiently on edge devices. My expertise includes deep learning frameworks like PyTorch and TensorFlow, along with experience in sound-event recognition and multimodal fusion. Even though I am new here, I have worked on numerous projects outside of freelancer and developed the skills necessary to complete this work effectively. I’d be glad to discuss your project—at best, we find a strong fit to work together; at minimum, you receive a complimentary consultation. Regards, Keagan
₹17.250 INR trong 30 ngày
0,0
0,0

I have just completed a similar project. I developed a multimodal deep learning system that seamlessly fuses audio and video streams within a unified architecture, reaching over 92% precision and recall on real-time edge devices. You won’t find a specialist better aligned with what you’re looking for. I understand the importance of achieving reliable, high-accuracy classification in a production-ready, resource-constrained environment. I specialize in transforming complex business requirements into high-converting, user-centric digital assets. I’d love to chat about your project! The worst that can happen is you walk away with a free consultation. Regards, Bjork Bronkhorst
₹28.150 INR trong 30 ngày
0,0
0,0

With my background in strategic product development and cutting-edge application of Python, I'm well-equipped to take on your multimodal cry classification project. At Prime Code Tech, we have extensive experience in sound-event recognition and deep learning, which compliments the rigorous and complex nature of this task. I have been personally involved in several projects that required significant data augmentation for improved model performance and accuracy, aligning perfectly with your specific needs for expanding the proprietary dataset. Throughout my career ranging from startups to established enterprises, I've demonstrated a keen eye for detail without compromising on agile and efficient execution - a key requirement for your project. Moreover, being no stranger to delivering under strict time frames, I assure you of a well-documented training pipeline and source code, trained model files/export, an inference script ready for integration including a detailed evaluation report with methodology - all within schedule and adhering to your specific product constraints.
₹12.500 INR trong 3 ngày
0,0
0,0

satna, India
Phương thức thanh toán đã xác thực
Thành viên từ thg 2 14, 2021
₹12500-37500 INR
$250-750 USD
₹8000-16000 INR
₹37500-75000 INR
₹12500-37500 INR
$30-250 USD
$10-30 USD
₹12500-37500 INR
₹600-1500 INR
€30-250 EUR
$2-8 USD/ giờ
$750-1500 USD
₹75000-150000 INR
$10-30 USD
₹600-1500 INR
₹1000-5000 INR
€6-12 EUR/ giờ
₹600-1500 INR
$250-750 USD
₹600-1500 INR
$1500-3000 USD
$250-750 USD
₹1500-12500 INR
$500-700 USD
₹750-1250 INR/ giờ