
Đã đóng
Đã đăng vào
Thanh toán khi bàn giao
Freelance RAG Engineer (LLM Systems) – Evaluation & Optimization Project Overview We are building Yuktha, an AI-driven women’s metabolic health platform (starting with PCOS). Our system uses a Retrieval-Augmented Generation (RAG) pipeline to deliver personalized recommendations (diet, supplements, lifestyle, coaching) via mobile app and WhatsApp. A baseline RAG system is already developed. We are looking for an expert to audit, optimize, and scale the system for production-grade performance. --- Scope of Work 1. RAG System Audit - Review current architecture (retrieval, embeddings, prompting, orchestration) - Identify: - Hallucination points - Retrieval failures - Latency bottlenecks - Context leakage / irrelevant responses --- 2. Retrieval Optimization - Improve: - Chunking strategy - Embedding selection - Query rewriting / expansion - Optimize vector search (recall vs precision tradeoff) - Implement hybrid retrieval if needed (semantic + keyword) --- 3. Prompt Engineering & Response Quality - Redesign prompts for: - Clinical-style accuracy (PCOS domain) - Structured outputs (plans, recommendations) - Reduce hallucinations - Ensure consistency across sessions --- 4. Personalization Layer - Improve user-context handling: - Symptoms - Test results - History - Implement memory-aware responses --- 5. Evaluation Framework - Build evaluation metrics: - Answer accuracy - Relevance - Safety - Create automated + manual evaluation pipeline --- 6. Integration Support - Ensure smooth integration with: - Mobile app - WhatsApp workflows (via APIs) - Optimize response latency (<2–3 seconds target) --- Expected Deliverables - Detailed audit report (issues + recommendations) - Improved RAG pipeline (code + architecture) - Prompt library (modular + reusable) - Evaluation dashboard / framework - Documentation for internal team --- Required Skills Must-Have - Strong experience with RAG systems in production - Hands-on with: - or - Vector databases ( / ) - Experience with LLM APIs ( / ) - Prompt engineering for structured outputs - Debugging hallucinations and retrieval errors --- Good to Have - Experience in healthcare / wellness AI - Knowledge of knowledge graphs - Experience with multilingual systems (Indian languages) - WhatsApp / conversational AI integrations --- Engagement Model - Duration: 4–8 weeks (initial engagement) - Mode: Remote - Commitment: 20–40 hours/week - Potential for long-term engagement --- Selection Criteria - Demonstrated work in real RAG systems (not demos) - Ability to explain trade-offs (precision vs recall, cost vs latency) - Strong debugging and system thinking skills --- How to Apply Please share: 1. Relevant RAG projects (GitHub / case studies) 2. Your approach to improving an existing RAG system 3. Tech stack familiarity 4. Availability and expected compensation --- What Success Looks Like - Significant reduction in hallucinations - Improved relevance and personalization - Faster response times - Production-ready, scalable system --- Note: This is not a basic chatbot project. We are building a high-trust health AI system, and accuracy + reliability are critical.
Mã dự án: 40354186
34 đề xuất
Dự án từ xa
Hoạt động 19 giờ trước
Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
34 freelancer chào giá trung bình ₹54.374 INR cho công việc này

RAG (Retrieval-Augmented Generation) systems optimization is our forte at CnELIndia. With over 18 years of experience in the industry, we've had extensive opportunities to work on a variety of complex projects, very similar to your AI-driven women's metabolic health platform. Our deep understanding of retrieval strategies, vector search optimization, and context management make us the ideal choice for this project. We have a strong track record of identifying and resolving system issues such as hallucinations, retrieval failures, and latency bottlenecks. Our expertise extends beyond just RAG. We are well-versed with healthcare and wellness AI domains, which aligns perfectly with your project's objective. Additionally, our familiarity with knowledge graphs adds value to the complexity around storage and retrieval of data within healthcare projects. We're deeply committed to not just meeting expectations but exceeding them. Should you choose us for this engagement, we'll ensure that not only are all your requirements met but we also deliver a production-ready and scalable system capable of significant reduction in hallucinations, improved relevance and personalization and faster response times. Given the trust factor associated with health AI systems, our accuracy-centric approach and strong debugging skills make us well-suited to deliver a successful project for you!
₹56.250 INR trong 15 ngày
5,5
5,5

Noticed Yuktha's goal of supporting PCOS through a RAG system. Recently optimized a similar pipeline for a wellness app, improving response times and personalization accuracy. Could you share more about the current bottlenecks in performance or user feedback you're tackling? Might be worth diving into your API integration setup to ensure seamless delivery across platforms. Can start reviewing the system right away. Let me know if you’d like to hop on a quick call to discuss further.
₹37.500 INR trong 7 ngày
5,0
5,0

Hi there, I’ve carefully reviewed the requirements for your GenAI project and I’m confident that my expertise in building NLP pipelines using Hugging Face and LangChain can meet your expectations. My experience includes working with large language models (LLMs) for Retrieval-Augmented Generation (RAG), as well as fine-tuning models with custom datasets to enhance text generation. I’ve successfully completed similar projects where I applied these techniques in Python to build robust, client-specific solutions. I would love the opportunity to discuss how I can leverage my skills to develop a tailored solution for your project. Feel free to take a look at my portfolio to get a sense of the work I’ve done: Portfolio: https://www.freelancer.com/u/webmasters486/AI-automation Looking forward to hearing from you! Best regards, Muhammad Adil
₹55.000 INR trong 8 ngày
4,8
4,8

Hi there, Strong alignment with this project comes from experience optimizing production-grade RAG systems with focus on accuracy, latency, and domain-specific reliability. Clear understanding of the requirement to audit your pipeline, reduce hallucinations, improve retrieval quality, enhance personalization, and build a robust evaluation framework for a health-focused AI system. Hands-on expertise ensures improvements across chunking, embeddings, hybrid retrieval, prompt design, and memory-aware responses with scalable architecture. Risk stays controlled through structured evaluation metrics, safety-focused prompt engineering, latency optimization, and rigorous testing for clinical-grade reliability. Available to start immediately happy to review your system and outline optimization strategy. Recent work: https://www.freelancer.com/u/chiragardeshna Regards Chirag
₹37.500 INR trong 7 ngày
4,4
4,4

Hello There, I have reviewed your requirement and can help audit and improve your RAG system for better accuracy, speed, and scalability. What I Will Do -Audit current system (hallucination, retrieval, latency issues) -Improve retrieval, prompts, and personalization -Optimize performance and response time -Support mobile & WhatsApp integration Deliverables -Audit report -Optimized RAG pipeline -Prompt library -Evaluation framework I have strong experience in RAG, LLM APIs, and AI systems and can share a demo of similar work. I’m available for ongoing work and long-term support. Best regards, Mohammed J.
₹120.000 INR trong 25 ngày
3,0
3,0

Hello, Yuktha’s focus on a high-trust PCOS RAG system aligns strongly with our experience in optimizing production-grade LLM pipelines. Approach: RAG Audit End-to-end review of retrieval, embeddings, prompts, and orchestration to identify hallucinations, weak grounding, latency bottlenecks, and context leakage. Retrieval Optimization Improved chunking (semantic + hierarchical) Embedding benchmarking (domain vs general) Query rewriting and expansion Hybrid retrieval (vector + keyword) to balance recall and precision Prompt Engineering Clinical-style, structured outputs (diet and lifestyle plans) Strong grounding instructions to reduce hallucinations Consistent multi-turn responses Personalization Layer Context-aware responses using symptoms, reports, and history Memory design with session and persistent user state Evaluation Framework Metrics for accuracy, relevance, and safety Automated and manual evaluation pipeline Performance and Integration Latency optimization under 2–3 seconds API-ready for mobile app and WhatsApp workflows Tech Stack LangChain or LlamaIndex, OpenAI or Claude, Pinecone or Weaviate, Redis, FastAPI Experience We have improved real RAG systems with measurable gains in grounding, relevance, and response speed. Case studies can be shared on request. Let’s connect and move forward. Best regards, Amaan Khan P. CUBEMOONS PVT LTD.
₹56.250 INR trong 7 ngày
2,7
2,7

Hi, This is Jagrati. I checked your project description and understand you’re building Yuktha, a production-grade RAG-based AI system for women’s metabolic health (PCOS), and you need an expert to audit, optimize, and scale the existing pipeline with a strong focus on retrieval quality, hallucination reduction, personalization, and evaluation. My approach would be to start with a deep audit of your current RAG architecture, including ingestion, chunking strategy, embedding model, vector store setup, retrieval logic, and prompting layer. I would analyze failure points such as low-recall retrieval, irrelevant context injection, prompt ambiguity, and latency bottlenecks. This includes tracing end-to-end query flow to identify where breakdowns occur. I’d be happy to go through the details and suggest the best technical approach. I have a few questions to get a better understanding: Q1 – What vector database and embedding model are you currently using? Q2 – Do you already have any evaluation dataset or ground truth for testing responses? Q3 – What LLM(s) are currently powering your generation layer, and are you using any orchestration framework (e.g., LangChain, LlamaIndex, custom pipeline)? Looking forward to hearing from you. Best regards, JP ?
₹56.250 INR trong 7 ngày
1,4
1,4

I specialize in production-grade RAG systems and can audit, optimize, and scale your existing pipeline for Yuktha. My approach focuses on reducing hallucinations, improving retrieval accuracy, and personalizing responses based on user context (symptoms, test results, history). I’ll deliver a robust prompt library, optimized embeddings, hybrid retrieval, and an evaluation framework, fully integrated with your mobile app and WhatsApp workflows, ensuring fast, reliable, and clinically accurate outputs.
₹57.000 INR trong 7 ngày
1,4
1,4

Hi there, You’re absolutely in the RIGHT PLACE. I’ve delivered SIMILAR PROJECTS multiple times and know EXACTLY how to execute this efficiently and correctly from day one. To lock down the SCOPE, TIMELINE, AND PRICING, I’ll need to ask you a few key questions. Unfortunately, Freelancer’s 1500 CHARACTER LIMIT doesn’t allow me to break everything down properly here. Let’s jump on CHAT so I can show you my PROVEN PAST WORK, walk you through the REAL RESULTS I’ve delivered, and outline a CLEAR ACTION PLAN for your project. You’ll immediately see why my approach is DIFFERENT and EFFECTIVE. If you’re serious about getting this done RIGHT, I’m ready to move forward. Looking forward to CONNECTING and WINNING TOGETHER. Cheers, Mayank Sahu
₹56.250 INR trong 7 ngày
0,0
0,0

Hey, This is exactly the kind of work I find genuinely interesting not just technically, but because it matters. A health AI system for PCOS that actually gets retrieval right could be meaningfully useful for a lot of people. That raises the stakes, which I think is a good thing. I've worked on production RAG systems the unglamorous parts that demos never show: chunking strategies that don't butcher context, hybrid retrieval when pure semantic search keeps missing, prompt structures that hold up across edge cases, and evaluation pipelines that catch regressions before users do. My instinct with an existing system like yours is to start with a structured audit before touching anything understand where retrieval is actually failing, whether it's a chunking issue, an embedding mismatch, a prompting gap, or something upstream in how user context gets passed in. Then fix in order of impact. For a health platform specifically, the hallucination problem isn't just a quality issue it's a trust issue. I'd prioritize getting that evaluation framework in place early so every improvement is measurable, not just felt. Stack-wise, I'm comfortable across LangChain/LlamaIndex, the major vector DBs, and OpenAI/Gemini APIs. Happy to go deeper on any of that. I'm available to start soon and can commit 20–30 hours a week for the initial engagement. Would love a quick call to look at what's already built before we talk numbers.
₹37.500 INR trong 1 ngày
0,0
0,0

✔ I deliver 100% work — 99.9% is not for me. ✔ Workflow Diagram RAG System Audit ⟶⟶ Retrieval Optimization ⟶⟶ Prompt Engineering ⟶⟶ Personalization Layer ⟶⟶ Evaluation Framework ⟶⟶ Latency Optimization ⟶⟶ Production Readiness Key Highlights ✔ Deep RAG Audit — identify hallucinations, retrieval gaps, latency issues, and context leakage. ✔ Retrieval optimization — improved chunking, embeddings, query expansion, and hybrid search (semantic + keyword). ✔ Precision vs Recall tuning — balance accuracy and coverage for health-critical outputs. ✔ Advanced prompt engineering — structured, clinical-style responses (diet, supplements, lifestyle plans). ✔ Hallucination reduction — grounded answers with stricter context control and validation layers. ✔ Personalization engine — user-aware responses using symptoms, reports, and historical context. ✔ Memory-aware architecture — session + long-term context handling without noise leakage. ✔ Evaluation framework — automated scoring (accuracy, relevance, safety) + human review loop. ✔ Latency optimization — target <2–3s responses via caching, batching, and efficient retrieval. ✔ Production-grade design — scalable, modular pipeline ready for mobile + WhatsApp integration. ✔ Prompt library — reusable, version-controlled prompts for consistency. ✔ Clean documentation — architecture, decisions, and handover for internal team.
₹37.500 INR trong 30 ngày
0,0
0,0

Hey, I liked your project, Optimize AI-Driven Women's Metabolic Health Platform and believe I can help you with the project. With my background in Machine Learning (ML), Debugging, API Integration, I'm confident I can meet your requirements. Would be glad to go over specifics if you're interested.
₹37.500 INR trong 7 ngày
0,0
0,0

This is exactly the kind of **production RAG system optimization** I specialize in—especially where **accuracy, safety, and latency** are critical (like healthcare use cases). **Relevant Experience:** * Built and optimized RAG pipelines using LangChain / LlamaIndex * Worked with vector DBs like Pinecone and FAISS * Hands-on with OpenAI API / Anthropic Claude * Focus on reducing hallucinations, improving retrieval precision, and building evaluation pipelines **My Approach:** 1. **Audit:** trace hallucination sources, retrieval gaps, latency bottlenecks 2. **Retrieval Optimization:** better chunking, hybrid search (semantic + keyword), query rewriting 3. **Prompting:** structured outputs (clinical-style), consistency, guardrails 4. **Personalization:** memory-aware context (symptoms, history, test data) 5. **Evaluation:** automated + manual metrics (accuracy, relevance, safety) 6. **Performance:** optimize for <2–3s response time **Deliverables:** ✔ Full audit report + actionable fixes ✔ Optimized RAG pipeline (code + architecture) ✔ Prompt library (modular) ✔ Evaluation framework/dashboard ✔ Integration-ready system (mobile + WhatsApp) **Availability:** 20–40 hrs/week (4–8 weeks) **Compensation:** flexible; can be finalized after scope discussion I focus on building **reliable, explainable RAG systems—not demos**. Happy to discuss your current pipeline and start with a deep audit. Let’s make this production-grade and trustworthy ?
₹40.000 INR trong 10 ngày
0,0
0,0

Drawing from years of experience working with RAG systems in diverse applications, I am confident in my ability to help optimize your AI-driven women's metabolic health platform, Yuktha. My core expertise lies in identifying and rectifying system limitations for production-grade performance by debugging hallucinations, retrieval errors, and fortifying prompt engineering for clinical-style accuracy. Additionally, my hands-on familiarity with vector databases and LLM APIs ensures that I can optimize your system's retrieval process by improving the chunking strategy, query expansion, and optimizing the vector search to achieve the perfect recall-precision trade-off. Furthermore, I bring a unique skill set that includes implementing hybrid retrievals (semantic + keyword), guaranteeing greater contextual relevance of responses for your app and WhatsApp workflows Incorporating lessons and refinements learned from previous healthcare/wellness AI projects involving knowledge graph implementation allows me to layer a personalization framework on top of the RAG pipeline. A crucial aspect of this approach is handling user context expertly, which includes symptoms, test results, and medical history; this guarantees personalized recommendations for each Yuktha user.
₹55.000 INR trong 7 ngày
0,0
0,0

Having spent significant time and garnered hands-on experience with API integration and debugging, I am confident in my ability to add significant value to your project. I specialize in building high-performance systems that can handle the scale and complexity your health AI system requires. Moreover, my experience with Java, Spring Boot, and event-driven architectures perfectly aligns with the need for the RAG system optimization you described in your overview. More specifically, I have deep knowledge in developing complex RESTful APIs and working with various third-party API integrations. In one of my recent projects, I built a Neo4j-based system that witnessed a staggering 70% ingestion performance improvement post my database query optimization work. This successful implementation showcases my proficiency in identifying pain points, designing effective strategies to address them, and delivering substantial results. Furthermore, as an ardent advocate of clean coding, I prioritize scalability, reliability, and maintainability at every step of development. Rest assured that if selected for this role, you not only get a competent professional who can deliver quality code but also someone committed to surpassing your project's expectations through continuous improvement and innovation. Let's build a production-ready, high-trust health AI system together!
₹56.250 INR trong 7 ngày
0,0
0,0

Hi, Yuktha sounds like a meaningful and impactful product — especially in a space like PCOS where personalization and accuracy really matter. We’ve worked on optimizing RAG-based systems and can help you audit the current pipeline, reduce hallucinations, improve retrieval quality, and bring response latency down to production standards. From refining chunking/embeddings to strengthening prompts and building a solid evaluation framework, we can take this from a working system to a reliable, scalable solution. Happy to share relevant experience and discuss how we can improve your system step by step. Best regards, Suman
₹56.250 INR trong 7 ngày
0,0
0,0

Hello, I bring 5+ years of experience building and optimizing production-grade AI systems, including RAG pipelines with LLMs, vector databases, and real-time applications. I can audit and enhance your existing system by identifying hallucination points, retrieval gaps, and latency bottlenecks, then redesigning the pipeline for accuracy, speed, and scalability. My approach focuses on improving retrieval (chunking, hybrid search, embeddings), strengthening prompt design for structured clinical outputs, and implementing a robust personalization layer with memory-aware context. I’ll also build an evaluation framework to measure accuracy, relevance, and safety—critical for a healthcare use case like PCOS. I’ve worked on similar data-sensitive AI systems where reliability and consistency are essential. Let’s connect to optimize your platform into a production-ready, high-trust AI system.
₹65.700 INR trong 7 ngày
0,0
0,0

Hi, I went through Yuktha, and this is not a typical RAG tuning problem. It’s a high-trust system where consistency and correctness matter more than anything else. From experience, issues here are rarely just prompts or embeddings. It’s usually retrieval not matching decision intent, lack of evaluation, and personalisation adding noise instead of value. My approach is to trace the full flow end-to-end (input → retrieval → context → output) to find where things actually break. Then improve retrieval in an intent-driven way, clean up context before it reaches the model, and enforce structured outputs with validation for consistency. For personalisation, I separate session context from long-term user data and prioritise only what’s relevant. I also set up a measurable evaluation framework so improvements aren’t guesswork, and include source attribution with basic guardrails to reduce unsafe or hallucinated outputs. On performance, I optimise for practical latency (~2–5s) using async handling, caching, and reducing unnecessary token usage. You can expect a clear audit, improved pipeline, structured prompts, and an evaluation setup your team can build on. Quick questions: * Bigger issue: hallucinations or irrelevant outputs? * Any evaluation metrics in place today? I handle architecture directly, with a small supporting team for implementation. I’m flexible on pricing if we extend timelines for deeper testing. — Shreyas
₹58.500 INR trong 10 ngày
0,0
0,0

Hi, This is exactly the kind of system I’ve been building—production-grade RAG pipelines with structured, reliable outputs (not demo chatbots). ? Relevant Experience Built RAG-based fact-checking systems with auditable reasoning Designed agentic pipelines processing 5,000+ multilingual inputs/day Strong focus on hallucination control, retrieval tuning, and LLM reliability ⚙️ My Approach (High-Level) Audit: Identify hallucination sources, retrieval gaps, latency issues Retrieval: Optimize chunking, embeddings, hybrid search (semantic + keyword) Prompts: Clinical-style structured outputs (plans, recommendations) Personalization: Memory-aware pipelines using user health context Evaluation: Build accuracy + relevance + safety benchmarking framework Optimization: Target <2–3s latency with efficient orchestration ? Tech Stack LangChain / LlamaIndex, OpenAI / Claude, FAISS / Pinecone, FastAPI, HuggingFace ? Why Me I focus on high-trust AI systems—where correctness, consistency, and explainability matter (perfect fit for health AI like Yuktha). ⏱️ Availability 20–40 hrs/week Can start immediately Happy to share architecture ideas or audit your current pipeline quickly before we begin. — Jaskaranjeet Singh
₹56.250 INR trong 7 ngày
0,0
0,0

Hi, This is exactly the kind of work I focus on. I can audit your pipeline, identify retrieval gaps, reduce hallucinations, and improve prompts for consistent, structured outputs while keeping latency low. I’ve worked on production-grade RAG systems and understand the tradeoffs between accuracy, cost, and speed. Quick question, which vector DB and LLM stack are you currently using? We could try a different approach on this, too Let’s connect.
₹56.250 INR trong 7 ngày
0,0
0,0

Hyderabad, India
Thành viên từ thg 4 7, 2026
₹750-1250 INR/ giờ
$30-250 USD
£10-20 GBP
$250-750 AUD
$30-250 CAD
₹2000-2500 INR
£20-250 GBP
$30-250 USD
₹37500-75000 INR
$250-750 USD
$15 USD
$10-11 USD
$60 USD
₹600-1500 INR
₹600-1500 INR
$1500-3000 AUD
$250-750 USD
$5000-10000 USD
$250-750 USD
₹12500-37500 INR