
Đã đóng
Đã đăng vào
------------------------PROJECT UP DATE--PLEASE READ----------------------- Thank you for your interest. We have to say that we do not have a lot of experience, which is exactly why we need help, but we do know what we need. Due to limited time resources, we are updating the information with Addendum 1 and Addendum 2, as well as this introductory note. Introductory note With those freelancers/providers who can offer us the requested information, once we get in touch we would like to see which products such a freelancer has already built, so that we can understand which of those existing products we could potentially reuse or adapt. We are interested in solutions ranging from very simple ones up to somewhat more complex ones, including agents. We are also interested in extracts/integrations for certain web pages and in one mobile application for parking. For now, we are extending our project with the two addendums below. *** ADDENDUM 1 – Clarification of the future project (business side) I am looking for paid assistance from a person or company who can: - refer me to an expert or company (or let me know if you have done this yourself) that has already built a similar project (multi‑LLM comparison, RAG, legal/medical domain), whether it is a public SaaS product or a private in‑house solution, or - point me to an existing software product with comparable capabilities that can be demonstrated. The task is to: - connect me with such a person/company, or - point me to such software (in production or as a custom solution for another client), so that this software can be presented (demo, walkthrough) and its capabilities clearly shown. If such software is available for sale or licensing, I am also interested in exploring purchase/licensing options. *** ADDENDUM 2 – Precise technical minimal scope (MVP) Minimal scope (MVP) of the system I want to build/use: 1. Orchestration - Implementation of a central orchestrator, preferably using Vellum Workflows, but I am also open to another commercial or custom orchestrator (no open‑source frameworks like LangChain, LlamaIndex, etc. in the core). [skywork]([login to view URL]) - Clearly separated modes of operation: - Normal mode: queries go only to the primary model (OpenAI). - Compare mode: manual trigger to compare OpenAI vs Anthropic on the same prompt. - Web‑check mode: manual trigger that sends a Perplexity/web‑research call (never running in parallel by default, only when explicitly requested). 2. LLM providers - Integration with: - OpenAI (primary model for generation and reasoning). - Anthropic (secondary model for answer comparison/sanity‑check). - Perplexity (used exclusively for additional web‑check/research). - Configurable parameters per model: temperature, max tokens, timeout, number of retries. 3. RAG layer - Ingestion pipeline for documents (PDF, DOCX) with basic cleaning (encoding, removal of headers/footers where feasible). - Document chunking + metadata (e.g. source, date, author, document type, jurisdiction/medical domain). - Vector database: - primarily pgvector on PostgreSQL, or - alternatively Pinecone as a managed solution – with a reasoned justification for the choice. [datacamp]([login to view URL]) - RAG queries must return citations in the answer (link to document + ID + page/paragraph range). 4. Database model (SQL) Minimum entities: - user (at least 2 users) - case/matter (legal or medical question) - user profile memory (preferences, answer style, language, etc.) - case memory (history of queries and key conclusions per case) - session summaries (session‑level summaries for long‑term context retention). 5. Security and audit - Authentication and authorization for 2 users, with roles: admin, user. - Audit log for every call: timestamp, model, provider, user_id, case_id, used document_ids, mode type (normal/compare/web‑check). - Encryption in transit (HTTPS/TLS). - Backup strategy for databases (SQL + vector store) with an approximate RPO/RTO. 6. Evaluations - Prepare and implement a minimal test set (at least 15 legal/medical questions) with reference answers or at least expected key citations. - Evaluation of: - citation accuracy (the model must cite real documents and relevant sections), - basic guardrails against hallucinations (e.g. answer “I do not know / not present in the documents” when there is no relevant context). 7. Requirements for candidates - No open‑source LLM frameworks (LangChain, LlamaIndex, etc.) in the core orchestration – I prefer custom code or a commercial platform. - Vellum experience is a plus, but not mandatory; I am open to strong alternative suggestions. [skywork]([login to view URL]) - Please apply only if you have already built a similar multi‑LLM RAG system (legal/medical domain is a strong plus) and can show the architecture or anonymized examples. ----------------------FIRST------ INFO---------DOWN--------------------------------------------------------------------------- I’d like expert guidance in choosing and setting up an application that lets me ask the same question to several AI models and view their text-based answers side by side. My priorities are a clean, user-friendly interface, built-in support for multiple models (OpenAI, Anthropic, etc.), and flexible comparison parameters so I can tweak temperature, context length, or other settings before each run. What I need from you first: • Shortlist two-to-three suitable tools (commercial or open-source) that genuinely meet the criteria above—something in the spirit of AI Arena but more complete. • A concise evaluation of each option: installation or subscription steps, model availability, limits, and any caveats you have discovered in real-world use. • Step-by-step help getting my chosen solution running on my Mac/Chrome environment and a quick walk-through of its core features so I can start comparing answers immediately. Once that is in place, I’d also like a brief roadmap for connecting with active Discord communities focused on AI experimentation; suggestions for vetted servers and an outline of how to integrate their resources with the comparison workflow will be appreciated. If everything goes smoothly, there will be follow-up tasks for me and a few friends who are exploring related AI topics, so clear documentation and ongoing availability for questions will be a plus.
Mã dự án: 40238597
24 đề xuất
Dự án từ xa
Hoạt động 19 ngày trước
Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
24 freelancer chào giá trung bình €15 EUR/giờ cho công việc này

Hello, I’m excited about the opportunity to contribute to your project. With my expertise in multi-model LLM tooling and a strong focus on clean, practical setup, I can shortlist 2–3 reliable side-by-side comparison apps, evaluate each for model support, limits, and real-world caveats, and get your chosen option running smoothly on Mac/Chrome with the key parameters you want (temperature, context, etc.). I’ll tailor the work to your exact requirements by providing a step-by-step setup walkthrough, a quick feature tour so you can start comparing immediately, and a simple roadmap for joining vetted AI experimentation Discord communities and folding their prompt/eval resources into your workflow. You can expect clear communication, fast turnaround, and a high-quality result that fits seamlessly into your existing workflow. Best regards, Juan
€15 EUR trong 40 ngày
3,2
3,2

Zagreb, Croatia
Phương thức thanh toán đã xác thực
Thành viên từ thg 2 17, 2026
$30-250 USD
$15-25 USD/ giờ
£20-250 GBP
₹12500-37500 INR
$10-30 USD
€30-250 EUR
€250-750 EUR
$10-30 USD
$30-250 USD
$20000-50000 USD
$30-250 CAD
₹150000-250000 INR
₹12500-37500 INR
$30-250 USD
$15-25 AUD/ giờ
$30-250 USD
₹1500-12500 INR
$10-30 USD
$15-25 USD/ giờ
$15-25 USD/ giờ