
Closed
Posted
Paid on delivery
Hello, I hope this message finds you well. We have an urgent requirement to source real enterprise-grade legacy codebases for internal evaluation and benchmarking purposes. We request your support in identifying and sharing repositories that strictly meet the criteria outlined below. 1. Minimum Eligibility (Mandatory) Repositories must meet all of the following: Minimum 100,000+ Lines of Code (LOC) At least 100+ Pull Requests (PRs) with meaningful discussions Minimum 50+ Issues, including several with detailed problem descriptions 200+ commits distributed over time (no bulk or single-day commits) Real, human-written production code (no AI-generated or synthetic projects) Originating from a real, verifiable company Must have legal rights available to share or transfer 2. Critical Requirement: PR Quality (Must Have) Each Pull Request should: Be linked to a specific issue Address a clearly defined problem Include both code changes and corresponding test updates Be reasonably scoped (neither too large nor trivial) Highly Preferred: PRs demonstrating Fail → Pass (F2P) behavior (i.e., tests fail before the fix and pass after implementation) :warning: Note: Repositories where PRs contain only code changes without test coverage will not be considered. 3. Preferred Technology Stack C# Java Python PHP .NET Framework COBOL Other legacy enterprise technologies 4. Preferred Industry Domains (High Priority) Banking / Financial Services Accounting Insurance Healthcare Legal Technology Government Systems Enterprise SaaS (complex workflow-driven platforms) Note: Ecommerce, retail, content platforms, and frontend-heavy applications are not within scope. 5. Technical Readiness (Very Important) Repositories should: Build and run successfully Include a Dockerfile (preferred) or clear setup instructions Have proper dependency management Follow a clean and structured project layout Contain test suites (preferably 50+ test files) Maintain clear PR-to-issue linkage Ensure each PR ideally resolves one issue 6. Required Metadata (Exact Figures) For each repository submitted, please provide: Company name, industry, and country Primary programming language(s) Exact Lines of Code (LOC) Number of files Number of commits Number of Pull Requests Number of Issues Number of contributors Repository age (years active) 7. Additional Notes Strong preference will be given to repositories with robust PR and test linkage Well-structured development history is critical Low-quality, bulk-imported, or poorly maintained repositories will be rejected We are specifically looking for high-quality engineering datasets, not just large codebases. Your careful validation before submission will be highly appreciated. Please treat this request as high priority and share suitable options at the earliest. If you have any questions or need clarification, feel free to reach out. Thank you for your support. Warm regards, Aditya
Project ID: 40436758
23 proposals
Remote project
Active 2 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
23 freelancers are bidding on average ₹103,420 INR for this job

Your requirement isn't a development project - it's a data procurement challenge with serious legal and ethical landmines. Before you invest budget here, we need to address three blockers that will kill this initiative if ignored. First issue: enterprise codebases with 100K+ LOC and detailed PR histories are proprietary assets. Companies don't "share" production banking or healthcare code because it contains business logic worth millions and is bound by NDAs, trade secret laws, and compliance frameworks like SOX or HIPAA. You're essentially asking for someone to commit corporate espionage or breach employment contracts. Second: the F2P test requirement signals you want training data for an AI model. If that's the case, you're walking into a copyright minefield. GitHub's public repos fall under various licenses (MIT, GPL, Apache) that may prohibit commercial AI training without attribution. Scraping private repos violates GitHub's ToS and potentially the CFAA. Third: even if someone claims to have "legal rights" to share a codebase, verifying that claim requires reviewing employment agreements, IP assignment docs, and company policies. Most developers don't own the code they write at work - their employer does. Two questions before we proceed: What is the actual end-use case? If this is for training a code generation model, you need synthetic data generation or licensed datasets from vendors like Hugging Face. If it's for benchmarking a static analysis tool, you need sanitized open-source projects with clear licensing. Have you consulted legal counsel on data acquisition? Because if a contributor submits stolen code and you use it commercially, you inherit liability for IP theft, even if you didn't know the source was compromised. Here's the compliant path forward: - OPEN SOURCE MINING: I can build a scraper that filters GitHub/GitLab for repos matching your LOC and PR criteria, then validates licenses permit your use case. This gets you 20-30 candidates in 48 hours. - SYNTHETIC GENERATION: If you need F2P test patterns, I can generate realistic legacy code scenarios using LLMs fine-tuned on permissively licensed datasets, avoiding IP issues entirely. - VENDOR PARTNERSHIPS: Companies like Software Heritage and Sourcegraph offer curated code datasets with cleared licensing. I can negotiate access and format the data to your specs. I've worked with 2 fintech clients on similar compliance-heavy data projects. Both initially wanted "real production code" until legal flagged the risks. We pivoted to synthetic generation and delivered 500K LOC of realistic banking logic without touching proprietary systems. Let's schedule a 20-minute call to clarify your constraints before anyone wastes time chasing repos that can't legally be shared.
₹101,250 INR in 30 days
6.2
6.2

Hello Aditya, I reviewed your requirement for sourcing enterprise-grade legacy codebases for benchmarking. This is a highly specialized task where engineering quality, workflow history, and PR/issue discipline matter more than repository size alone. The key challenge is filtering out AI-generated, forked, or low-quality repositories and identifying real production systems with strong engineering signals. I can source and validate repositories in Java, Python, PHP, C#, .NET, COBOL, and other legacy stacks, especially in domains like banking, insurance, healthcare, government systems, legal tech, and enterprise SaaS. My validation includes: * Commit history and long-term activity check * PR-to-issue linkage and discussion quality * Test coverage and CI/CD maturity * Build/setup readiness and architecture quality * Removal of synthetic or bulk-imported repos For each repo, I will provide: * Company, industry, origin * Tech stack * LOC, commits, PRs, issues, contributors * Repository age and maintenance status * Engineering quality summary Deliverable will be a curated, verified dataset—not just a list. I can start once you confirm: * Number of repositories needed * Licensing constraints * Priority technologies or domains Looking forward to working on this.
₹112,500 INR in 7 days
5.3
5.3

As an accomplished software engineer at Dlite Info Tech Pvt Ltd, I am ideally positioned to support your Legacy Codebase Sourcing project. With my expertise in Java, PHP, and Python and my track record of delivering high-quality results, I can help you identify repositories that meet all your specifications. Understanding the critical need for PR Quality, I will leverage my experience analyzing massive PR databases to provide you with ones that are linked to specific issues, address clearly defined problems and include both code changes and corresponding test updates. I share your preference for PRs demonstrating Fail → Pass behavior as they depict real-world scenarios that we can learn the most from. Moreover, my proficiency in various languages including C#, .NET Framework, COBOL and other legacy enterprise technologies aligns perfectly with Preferred Technology Stack 字符The refrigerator isn't producing ice in registers that stringent note's' and have a good understanding of many of the preferred industry domains like Banking / Financial Services', Accounting, Insurance 'and'Enterprise SaaS (complex workflow-driven platforms)'. This broad knowledge base will enable me to not only identify compatible repositories but also ensure we source 'real, human-written production code. To conclude - Hire me now and let's transform your technological challenges into seamless solutions!
₹112,500 INR in 7 days
4.5
4.5

Hi Aditya, We can support this from Pheonixsolutions. This is a very specific requirement, so we would not treat it like a normal repo search. The focus will be on finding real enterprise-grade legacy codebases with proper engineering history, legal shareability, meaningful PRs, issue linkage and test coverage. Our approach would be: First, we’ll shortlist only repositories that match the mandatory filters like 100k+ LOC, 100+ PRs, 50+ issues, 200+ commits, real company origin and active development history. Then we’ll validate the quality of PRs, especially whether they are linked to issues, contain both code and tests, and show proper problem-solving instead of random code changes. We’ll also check technical readiness, including build instructions, dependency setup, Docker availability, test suites and overall project structure. For every suitable repository, we’ll prepare the exact metadata you asked for: company, industry and country primary languages LOC file count commits PRs issues contributors years active test coverage and PR quality notes We understand that low-quality, bulk-imported or synthetic projects are not useful here, so we’ll carefully filter before submitting anything. We can start with an initial validated list and then expand based on your feedback. Thanks Pheonixsolutions
₹112,500 INR in 7 days
3.5
3.5

Hello Aditya, I’ll be straightforward—finding repositories that meet all your constraints (especially PR ↔ test linkage + enterprise origin + legal transfer rights) is extremely rare in public datasets. Most enterprise codebases are private, and most open-source projects won’t satisfy the strict PR/test criteria. That said, I can help you source high-quality, near-matching datasets + validate them properly. What I can do: • Identify and shortlist enterprise-grade repositories (100K+ LOC, strong history) • Deep-validate PR quality (issue linkage, test updates, discussion depth) • Filter by your preferred domains (banking, healthcare, gov, SaaS) • Extract and verify exact metadata (LOC, commits, PRs, issues, contributors, etc.) • Provide build/run validation (Docker/setup readiness) • Flag legal/licensing status clearly (very important for your use case)
₹112,500 INR in 7 days
1.4
1.4

Hello Aditya, I can help you source and validate high-quality enterprise-grade legacy repositories that match your engineering and benchmarking requirements. I am a Full Stack Java Developer with 6+ years of experience working on large-scale enterprise applications, banking systems, secure backend APIs, workflow-driven platforms, and production-grade architectures using Java, Spring Boot, JPA, MySQL/PostgreSQL, and related technologies. I understand the importance of: • PR-to-issue linkage • meaningful commit history • test-backed fixes (including Fail → Pass patterns) • maintainable enterprise architecture • clean dependency and build management I can carefully curate repositories based on your criteria, including: • 100k+ LOC enterprise projects • 100+ PRs with discussions • issue tracking quality • test coverage validation • contributor and activity analysis • Docker/setup readiness • legal/shareability verification I can also provide structured metadata for each repository: • LOC, commits, PRs, issues, contributors • tech stack and industry domain • repository age and activity patterns • engineering quality observations My technical background allows me to evaluate repositories from an actual software engineering perspective rather than simply collecting GitHub links. I can start immediately and provide a carefully validated shortlist quickly. Looking forward to collaborating with you.
₹112,500 INR in 7 days
1.5
1.5

Hello, I’m Nikita Agarwal from Groww Per Click. We reviewed your requirement for sourcing and validating enterprise-grade legacy repositories for internal benchmarking and engineering analysis. We can assist in identifying legally accessible, real-world open-source or enterprise-released codebases that match your strict structural and PR-quality criteria. We understand your focus is not just on code size, but on engineering maturity, traceability, and strong development history. ✔ Identification of eligible enterprise-grade repositories (open-source / legally shareable) ✔ Filtering by required tech stacks (Java, C#, Python, PHP, .NET, COBOL) ✔ Industry-specific sourcing (banking, finance, healthcare, government, insurance, SaaS) ✔ Validation of LOC, commits, contributors, issues, and PR structure ✔ PR-to-issue linkage analysis and quality scoring ✔ Buildability check with setup/Docker instructions ✔ Metadata report for each repository as per required format ✔ Screening for meaningful PR activity and test-linked changes (where available) We ensure strict compliance by only including verifiable and legally distributable repositories, with full transparency on source and metrics. We focus on delivering a clean, validated, and structured dataset that aligns with your evaluation standards and can be directly used for internal benchmarking. Looking forward to working with you. Best Regards, Nikita Agarwal Groww Per Click
₹75,000 INR in 7 days
1.0
1.0

Hello Aditya, I appreciate the opportunity to assist with your urgent requirement for sourcing enterprise-grade legacy codebases. With a clear understanding of your criteria, I will leverage my expertise and AI tools to efficiently identify repositories that meet your mandatory and critical requirements. My approach will include thorough validation of each codebase, ensuring they are not only extensive but also maintain high standards in PR quality and technical readiness. I will provide detailed metadata for each repository, including essential information such as company details, programming languages, LOC, and development history. I am committed to delivering quality results promptly, and I anticipate being able to compile and share suitable options within 14 days. Please feel free to reach out if you have any further questions or need clarification on specific aspects. Thank you for considering my proposal. I look forward to collaborating with you on this project.
₹92,610.01 INR in 14 days
0.6
0.6

Hi Aditya, I can help source and validate enterprise-grade legacy repositories matching your requirements, including PR/issue linkage, test coverage, commit history quality, and engineering maturity. I’ll focus on real production codebases from verified companies across banking, healthcare, SaaS, and government domains, with structured metadata and technical validation included. One quick question: are you only looking for publicly shareable/open-source repositories, or can commercially transferable private codebases also be considered? Best,
₹112,500 INR in 1 day
0.0
0.0

Hello Aditya, I’m a Senior Full-Stack & Mobile Developer with 7+ years of experience building web apps, mobile apps, APIs, AI-powered systems, and scalable platforms for startups and businesses. I can help source and validate enterprise-grade legacy repositories that meet your strict engineering and dataset-quality requirements. My approach includes: • Identifying real company-backed repositories with verified development history • Validating LOC, commits, PR quality, issue linkage, contributors, and test coverage • Prioritizing banking, healthcare, insurance, government, and enterprise SaaS domains • Reviewing PR-to-issue mapping, F2P patterns, and structured engineering workflows • Delivering detailed metadata for each shortlisted repository in a clean report format Preferred stacks I can focus on: • Java, Python, C#, PHP, .NET, COBOL, and other enterprise legacy systems I understand the importance of high-quality engineering datasets over simply large repositories and will carefully filter out low-quality or synthetic projects. I’m available to begin immediately and can provide the first validated shortlist quickly.
₹112,500 INR in 7 days
0.0
0.0

Hello Aditya, I understand you are looking for high-quality enterprise-grade legacy repositories for internal benchmarking with strict engineering and PR quality standards. I can support you in identifying and validating real, production-level open-source systems that closely match your requirements from trusted ecosystems such as Apache, Eclipse, Microsoft, CNCF, and selected government or enterprise-backed projects. My approach will include deep filtering based on repository size (LOC), commit history, issue and PR structure, contributor activity, CI/CD readiness, and overall engineering maturity. I will also analyze whether PRs are properly linked to issues, include meaningful code and test changes, and reflect real development workflows where available. For each shortlisted repository, I will provide a complete metadata breakdown including organization, industry domain, primary language, LOC estimate, number of commits, PRs, issues, contributors, files, and project age. Additionally, I will clearly flag build readiness, test coverage quality, and any gaps in F2P-style testing workflows so you can evaluate dataset suitability quickly. Where exact compliance is not possible due to the nature of open-source ecosystems, I will ensure transparency and provide the closest high-confidence alternatives. My goal is to deliver a carefully curated, audit-ready shortlist of real enterprise systems that best align with your benchmarking needs. best regards Habib Ullah
₹112,500 INR in 7 days
0.0
0.0

Hi Aditya, Thank you for sharing the detailed requirements. We can assist in sourcing and validating enterprise-grade legacy repositories that align with your benchmarking and engineering dataset criteria. We understand that the focus is not only on repository size but also on the quality of engineering practices, PR discussions, issue linkage, test coverage, and historical development maturity. We will specifically evaluate repositories against: • LOC, commits, PRs, issues, and contributor thresholds • PR-to-issue traceability • Presence of automated tests and F2P-style workflows • Real enterprise ownership and legal shareability • Build readiness, dependency management, and project structure • Legacy enterprise technology stacks and workflow-heavy domains Our review process will include: • Manual repository validation • Engineering quality assessment • Metadata extraction and verification • Identification of repositories with meaningful development history and maintainable architecture Preferred stacks such as Java, C#, Python, PHP, .NET, and enterprise-oriented systems in banking, healthcare, insurance, legal tech, and government domains will be prioritized. We can begin the research and shortlisting process immediately and share an initial curated list at the earliest. Looking forward to working with you on this high-priority requirement.
₹90,500 INR in 7 days
0.0
0.0

As an ideal freelancer for the task at hand, I have a couple of striking attributes that is essential for this project. In line with the nature of the project, my Copywriting skills will ensure that I meticulously source, curate, and present real enterprise-grade legacy codebases which strictly meets your outlined criteria. This involves a careful validation process on my part to include only high-quality engineering datasets and exclude low-quality codebases. So you can rest assured that you'll get precisely what you've requested. Moreover, my adeptness in content writing would be invaluable in organizing the necessary metadata for each legacy codebase submitted with sheer precision. By incorporating exact figures regarding origin, size, age, programming languages used, and any other required information, you can easily assess their standing against your outlined criteria and also compare various aspects of these datasets. Lastly, with my innate attention to detail honed from my years of experience in data entry work, you can count on me to ensure all submissions are ready for technical analysis. This includes ensuring they compile successfully and have proper dependency management which is further complemented by my proficiency in creating Dockerfiles or clear setup instructions if necessary. As we progress forward, you have the assurance of not just quality outputs but timely ones too.
₹112,500 INR in 7 days
0.0
0.0

Hi, Resonite Technologies can help source and validate enterprise-grade legacy codebases for benchmarking and internal evaluation. We will verify: ✔ 100K+ LOC ✔ 100+ meaningful PRs linked to issues ✔ 50+ detailed issues ✔ 200+ real commits over time ✔ Human-written production code from verifiable companies ✔ Test coverage with code + test updates in PRs ✔ Build/run readiness, Docker/setup validation ✔ Legal sharing or transfer eligibility Focus stacks: C#, Java, Python, PHP, .NET, COBOL and other enterprise legacy technologies. Preferred domains: Banking, insurance, healthcare, legal tech, government systems and enterprise SaaS. Deliverables include: • Technical validation report • Exact repo metrics (LOC, PRs, issues, commits, contributors, age) • PR-to-issue/test linkage analysis • Structured shortlist of qualified repositories Ready to start immediately and provide carefully validated options only.
₹149,500 INR in 7 days
0.0
0.0

⭐ONLY PAY IF YOU’RE IMPRESSED⭐ We have extensive experience sourcing and evaluating enterprise-grade legacy codebases matching your stringent criteria. We can streamline your search, providing repositories that ensure quality, compliance, and complete metadata. Core Deliverables: • Verified repos with 100K+ LOC, PRs, Issues, commits • PR-issue linkage with test coverage (Fail → Pass) • Support for preferred tech stack & industries • Complete metadata and setup guidance Our Approach: • Rigorous validation for code quality and legal rights • Detailed documentation and metrics for each repo • Focused on your critical requirements and tech preferences Kindly share your budget range to enable a tailored timeline and project breakdown. Committed to delivering high-quality results aligned to your goals. Looking forward to collaborating. Kind regards, Aaron Roberts Happy Screen Solutions
₹100,000 INR in 5 days
0.0
0.0

Identifying and delivering secure, clean, and professional enterprise-grade legacy codebases with the detailed criteria you outlined is a challenge I am uniquely equipped to meet. Your requirement for repositories with 100,000+ lines of human-written production code, strict PR-to-issue linkage, meaningful test coverage, and real-world business provenance aligns perfectly with my expertise in managing complex, legacy enterprise projects across C#, Java, Python, and .NET frameworks. My experience goes beyond surface-level identification; I ensure every repository is malware-free, fully documented with exact metrics, LOC, commits, issues, contributors—and verified for business continuity and legal shareability. Unlike typical freelancers, I apply a business-focused, scalable approach that prioritizes transparency, automated validation, and smooth integration readiness. While I am new to Freelancer, I have extensive experience and have successfully completed many projects off-platform. I’m open to a quick chat — worst case, you walk away with a free consultation and clear direction for your project. Regards, Migel Uys
₹112,500 INR in 30 days
0.0
0.0

Hi Aditya, As a Ph.D. Research Scholar at IIT Kharagpur specializing in Computer Science and deep learning, my daily research involves curating, validating, and benchmarking massive engineering datasets. I understand exactly what distinguishes a high-quality, enterprise-grade codebase from a poorly maintained or synthetic repository. I can expertly source and validate legacy codebases (Java, C#, .NET, Python) from your target domains (Finance, Healthcare, Enterprise SaaS) that strictly meet your evaluation criteria. My Curation Approach: Algorithmic & Manual Filtering: I will utilize advanced GitHub search APIs and manual code-review techniques to isolate repositories with >100,000 LOC, >200 authentic human commits, and verifiable corporate origins. PR & Test Validation: I know how critical test coverage is for benchmarking. I will specifically target repositories demonstrating strong PR-to-issue linkage and Fail-to-Pass (F2P) behaviors. I will verify that the code builds successfully and includes comprehensive test suites (50+ files) and Dockerfiles. Metadata Matrix: For every sourced codebase, I will provide an exact metadata report detailing company origin, precise LOC, PR/Issue counts, and verifiable licensing rights to ensure safe internal use. I am ready to treat this as a high priority and deliver the first batch of strictly validated repositories for your review. Best regards, ASHIQUR RAHAMAN MOLLA
₹100,000 INR in 21 days
0.0
0.0

Pune, India
Member since Dec 22, 2025
₹150000-250000 INR
₹250000-500000 INR
₹150000-250000 INR
₹250000-500000 INR
₹150000-250000 INR
$30-250 USD
$10-30 USD
₹12500-37500 INR
€30-250 EUR
$30-250 USD
$8-15 USD / hour
₹12500-37500 INR
₹750-1250 INR / hour
$5000-10000 USD
$250-750 USD
min $50 USD / hour
$30-250 USD
₹75000-150000 INR
$750-1500 USD
$10-30 USD
₹12500-37500 INR
₹1500-12500 INR
₹100-400 INR / hour
₹600-1500 INR
$15-25 USD / hour