
Đã hoàn thành
Đã đăng vào
Thanh toán khi bàn giao
Job Description We need a disciplined SQL/data engineer to produce a deterministic, re-runnable script-based build (lean MVP, not enterprise infra). Goal Create a single canonical, one-row-per-company dataset anchored on UEI, then enrich it using exact UEI matches (no name guessing). Inputs (provided) USAspending extract / source tables (L24M window) DSBS export (CSV) with UEI + owner contact fields SAM Active Entity extract (CSV) with UEI + phone + officer names Written Segment B & C rules (explicit) Deliverables 1. USAspending segmentation + canonical vendor table >Filter to last 24 months >Vendor-level aggregation → one row per UEI >Apply Segment B & C logic exactly as written >Deterministic CSV output >Basic QA checks: a. no duplicate UEIs in output b. obligation reconciliation tie-out (aggregate totals match expected) 2. UEI-based enrichment + final CSV >Left join canonical list → DSBS export on UEI (exact match only) >Left join canonical list → SAM extract on UEI (exact match only) >Compute dial_phone priority field (define precedence clearly; we will confirm the rule) >No fuzzy matching, no name-based joins, no inference >No companies dropped: all UEIs from canonical list must remain in final output (nulls allowed if no match) >Final enriched CSV output [login to view URL] / Integrity notes Any rows missing UEI or breaking determinism must be placed into an exceptions table + flagged, not guessed. Non-Negotiables 1. UEI is the identity anchor. 2. Do not dedupe by name. 3. If rules conflict or data is ambiguous: stop and flag before proceeding. 4. Follow SOP exactly. Working Style We want someone who: >thinks in SQL terms (joins, GROUP BY, keys, aggregation) >communicates clearly when data conflicts exist >prefers correctness over speed Tools You can use: a) SQL (Postgres/MySQL/BigQuery style acceptable) b) Or SQL + Excel/Sheets only for validation Output must be CSV. Budget Fixed USD 80 total, paid as 3 milestones: Milestone 1: Stage/load + USAspending L24M filter (deliver staging SQL + intermediate output) Milestone 2: Canonical vendor aggregation + Segment B/C logic + QA tie-outs Milestone 3: DSBS + SAM UEI joins + dial_phone + final enriched CSV How to Apply (Required) Reply with: 1. Your SQL dialect experience 2. One example of a deterministic aggregation + join project (brief) 3. Answer this scenario (2–3 lines): “After aggregation, 5–10% transaction rows are missing UEI. Names exist but inconsistent. You are told not to infer by name. What do you do and why?” Applicants who propose fuzzy matching or name dedupe will be rejected.
Mã dự án: 40229865
26 đề xuất
Dự án từ xa
Hoạt động 21 ngày trước
Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc

Hello, Thank you for the invitation. This project aligns very well with my experience in SQL-based data preparation, aggregation, and deterministic ETL workflows. 1) SQL dialect experience I work mainly with SQL Server, PostgreSQL-style syntax, and BigQuery. I regularly use joins, aggregations, window functions, and data-cleaning logic to build structured datasets for BI and analytics. 2) Example of deterministic aggregation + join project In a recent BI project, I built a canonical dataset by aggregating transactional data at the customer level using SQL. I applied strict keys for joins, avoided fuzzy matching, and implemented QA checks to ensure no duplicate keys and that totals reconciled with source data. 3) Scenario answer If 5–10% of rows are missing UEI and name inference is not allowed, I would isolate those records into an exceptions table and flag them for review. This preserves determinism and prevents incorrect merges based on inconsistent names. I’m comfortable following strict rules, working with UEI as the primary key, and delivering clean, re-runnable SQL scripts with QA checks. Best regards.
$80 USD trong 5 ngày
0,0
0,0
26 freelancer chào giá trung bình $70 USD cho công việc này

Hi, 1. Experienced in multiple SQL dialects - OracleSQL, PostgreSQL, MySQL, T-SQL (Microsoft SQLServer), HQL (Hadoop Hive) etc., 2. In this particular scenario, I'm unaware of what DSBS extract and SAM Activity extract contain, but you aggregate the USA Spending Extract over UEI (and any other possible fields such as month/year etc., if needed), left join it with DSBS and SAM, get the required fields from those extracts, and generate the Segmentation table based on the aggregated values. I've done such joins and aggregations all through my career, but thought a brief of the current scenario would be more helpful in the bid 3. As mentioned in the project requirement, rows with missing UEIs will be routed to an exceptions table and flagged. Further scripts could be generated so that once the flagged data is reviewed and corrected, it could be re-processed into the segmentation table with required re-aggregation. If needed, as a separate project, full fledged SQL pipelines could be developed that would handle this whole data process workflow. Reach out to me for any further discussions. Best regards, Nrupesh
$88,88 USD trong 10 ngày
4,6
4,6

As a Full-Stack Developer and Statistician with a solid background in BigQuery, Data Warehousing, and Database Development, I am well-equipped to deliver the deterministic, re-runnable SQL script-based build you seek. Over 7 years in the field has ingrained in me the SQL thinking necessary for conducting complex joins, applying logic and aggregations that are essential for your project. My expertise extends to mainstream SQL dialects like MySQL and PostgreSQL which aligns perfectly with your requirements. A project much like yours that I successfully completed was an eCommerce data aggregation assignment where precise aggregations and joins were crucial while maintaining data integrity during the process. Your requirement for deterministic aggregations and a avoidance of fuzzy matching or name dedupe perfectly overlaps with my preferred approach to quality versus speed. In response to your given scenario, if transaction rows are missing UEIs or inconsistent names do not allow inferring by names - compromising data integrity - I would outline these problematic rows in an exceptions table and clearly flag them instead of implicitly guessing names. This approach ensures transparency of data inconsistency while adhering to prescribed non-negotiables in your project. Together, we can make sure every piece of information is treated exactly as it should be.
$80 USD trong 2 ngày
3,9
3,9

Hi, I’m a SQL-focused data engineer with strong experience building deterministic, re-runnable data pipelines and identity-based aggregation datasets. * SQL Dialect Experience SQL Server (9+ years – DBA & Data Engineering) PostgreSQL Databricks / Spark SQL Large dataset aggregation, ETL scripting, deterministic builds * Deterministic Aggregation + Join Example I worked in the employment industry where we manage databases containing over 10 million person IDs for job seekers. I build deterministic aggregation datasets anchored on unique identifiers, producing one-row-per-identity outputs using strict key-based joins only (no fuzzy or name-based matching). Outputs are fully re-runnable and validated with duplicate key checks and reconciliation totals. * Scenario Answer If UEI is missing, I move those rows into an exceptions table and flag them for remediation. I do not use name matching because it breaks determinism and violates identity rules. I fully align with your non-negotiables: UEI as identity anchor, deterministic outputs, and flagging conflicts instead of guessing. Available to start immediately.
$80 USD trong 7 ngày
1,9
1,9

With particular expertise in BigQuery and PostgreSQL, I am an experienced and disciplined database engineer who can deliver precisely what you need for this project. My career has endowed me with a deep understanding of SQL terms such as joins, GROUP BY, keys, and aggregation which you've specifically mentioned as essential to this project. I bring eight years of experience working on deterministic data projects that involve complex joins and aggregations. One example is when I built a dataset for a large banking institution that involved aggregating transactional data from multiple branches across numerous tables, each with different structures and unique IDs" In response to your scenario, I will unequivocally adhere to the given instruction of not inferring by name for missing UEIs. Instead, I will proceed to flag those specific rows in query output as 'exceptions' while also noting inconsistencies in names if any, in the 'Integrity notes' section, without making any guesses. This meticulous approach to preserving data integrity even when faced with uncertainties ensures that the output you receive is accurate and reliable.
$45 USD trong 7 ngày
0,7
0,7

Hello, I am an experienced full-stack developer with strong expertise in SQL (MySQL/Postgres) and Python data workflows. My background includes building deterministic ETL pipelines and canonical datasets for reporting and compliance. For your project, I will: - Filter USAspending data to the last 24 months and build a canonical vendor table anchored on UEI - Apply Segment B & C rules exactly as written, ensuring deterministic outputs - Perform UEI-based joins with DSBS and SAM extracts (no fuzzy matching, no name inference) - Define and implement dial_phone priority rules with clear documentation - Produce final enriched CSV output with QA checks (no duplicate UEIs, obligation reconciliation tie-out) - Flag exceptions in a separate table without guessing or inference Scenario response: If 5–10% of rows are missing UEI, I will stop and flag them into an exceptions table. Since UEI is the identity anchor, I will not infer by name—this preserves determinism and correctness. I can deliver all three milestones within 7–10 days. Proposed bid: $55 USD. Best regards, Somee
$55 USD trong 10 ngày
0,4
0,4

Hi, I’m Ranit Mondal, a Full-Stack Developer experienced in building and improving production-ready web applications. I specialize in React, Node.js, APIs, and databases, with a strong focus on clean UI and optimized backend logic. I can quickly understand your requirements, fix issues, and deliver a stable, scalable solution with clear communication throughout. Let’s discuss your project and get started ?
$45 USD trong 7 ngày
0,0
0,0

Hi, I’m a SQL-focused data engineer with strong experience in PostgreSQL, MySQL, and BigQuery-style SQL, especially around deterministic builds, aggregation pipelines, and strict key-based joins. I’ve worked on projects where I built canonical, one-row-per-entity datasets using transactional sources, enforced hard identity keys, and produced fully re-runnable SQL scripts with QA tie-outs (record counts, sum reconciliation, duplicate key checks). I’m comfortable stopping and flagging data issues instead of making assumptions. Scenario answer: If 5–10% of rows are missing UEI, I would exclude them from the canonical aggregation, place them into an exceptions table, and clearly flag the issue. Since identity is undefined and name inference is disallowed, proceeding would break determinism and violate the SOP. I prefer correctness over speed, communicate clearly when data conflicts arise, and follow written rules exactly. This project is a good fit for how I normally work. Thanks, Anuj
$45 USD trong 2 ngày
0,0
0,0

With 7 years of experience in SQL/data engineering, I am the best fit to complete this SQL Canonical Dataset Build project. I have the relevant skills to produce a deterministic, re-runnable script-based build as per the provided inputs and goals. **How I will complete this project:** - Filter USAspending data to last 24 months - Perform vendor-level aggregation for one row per UEI - Apply Segment B & C logic exactly as written - Conduct basic QA checks to ensure data accuracy - Enrich dataset by left joining with DSBS export and SAM extract on exact UEI matches - Compute dial_phone priority field as per defined precedence - Handle exceptions by placing rows with missing UEI into an exceptions table **Tech stack I will use:** - SQL (Postgres/MySQL/BigQuery) - Excel/Sheets for validation - Output in CSV format I have worked on similar solutions in the past, focusing on deterministic aggregation and join projects. By strictly following the provided rules and guidelines, I ensure correctness over speed and clear communication in case of data conflicts. In the scenario where 5-10% transaction rows are missing UEI with inconsistent names, I would handle them by flagging them as exceptions and not inferring by name, as per the project's non-negotiable rules. **Roadmap:** - Milestone 1: Stage/load + USAspending L24M filter - Milestone 2: Canonical vendor aggregation + Segment B
$11 USD trong 7 ngày
0,0
0,0

With all due respect to the project description, your budget of Fixed USD 80 for such a complex task seems a bit below par considering its technicality and the thoroughness and expertise it demands. Even though I'd love to take up this SQL Canonical Dataset Build, the offer does not align with the services I provide. I bring to the table an experience of working with Microsoft Excel for a year, handling data cleaning, sorting, formatting, and much more. While these skills correlate with your needed deliverables in terms of general data management and organization, it doesn't touch upon actual data engineering. I understand that budgets can be tight but I would suggest raising your budget significantly if you want a skilled SQL engineer with an understanding of complex database structures who can produce the accurate and clean results you expect. Investing in someone who's specialized in canonical dataset construction will ensure not only that there are no code ambiguities or conflicts but also an efficient implementation into any scale environment. I appreciate your time considering me for this project but I urge you to look for a more suitable candidate with proper qualifications and experience that matches your needs.
$45 USD trong 1 ngày
0,0
0,0

With a rich background in Full Stack Development, Business and Data Analysis, I am well-versed in leveraging powerful tools and technologies to extract meaningful insights from complex datasets. Excelling in SQL skills across various DBMSs (PostgreSQL, MySQL, BigQuery), I can guarantee you clean and structured output, adhering strongly to your non-negotiables: prioritizing the uniquely identifying UEI as an anchor and refraining from name-based deduplication. My track record includes deterministic aggregation and join projects that align perfectly with the objectives of this role. I will bring my signature mix of meticulousness and proactivity to your project. In a scenario where a percentage of rows are missing UEIs but names are inconsistent, I strictly abide by your instruction – no inference by name. Therefore, for such instances, I would place these inconsistent rows into an exceptions table and flag them for further scrutiny, ensuring data integrity isn't compromised while embracing accountability.
$80 USD trong 7 ngày
0,0
0,0

Hello, I can complete your SQL Canonical Dataset Build (USAspending) project by writing a deterministic, re-runnable SQL script that produces a clean, one-row-per-company canonical dataset anchored on UEI and enriched with the DSBS + SAM datasets via exact UEI joins (no fuzzy matching or name inference). I’ll ensure all aggregation, filtering to the last 24 months, Segment B & C logic, QA tie-outs, and final CSV outputs are delivered exactly as specified. Regards, Bharti
$45 USD trong 7 ngày
0,0
0,0

I’ve delivered similar projects and know exactly how to build this cleanly and efficiently. Your requirement for a deterministic, one-row-per-company dataset anchored strictly on UEI—with no fuzzy matching or name inference—is clear and aligns perfectly with my approach to creating seamless, accurate SQL-based data pipelines. While I’m new to Freelancer, I’ve delivered multiple real-world projects for clients outside the platform and bring the same level of quality and accountability here. My expertise in Postgres SQL, data aggregation, and rigorous QA checks will ensure your filtered USAspending data merges perfectly with DSBS and SAM extracts, respecting all Segment B & C rules and exception handling. Happy to walk you through my approach and next steps. Regards, Dylan Rheeder
$40 USD trong 14 ngày
0,0
0,0

I have just completed a similar project. I developed a deterministic SQL pipeline that created a one-row-per-entity canonical dataset anchored on unique identifiers, enriched by exact-match joins without inference. You won’t find a specialist better aligned with what you’re looking for. I understand the importance of strict UEI anchoring and preserving data integrity by flagging exceptions rather than guessing in your project. I specialize in transforming complex business requirements into high-converting, user-centric digital assets, ensuring precision and clarity throughout. I’d love to chat about your project! The worst that can happen is you walk away with a free consultation. Regards, Bjork Bronkhorst
$50 USD trong 7 ngày
0,0
0,0

Hello Sir, I can do this data entry task accurately and on time. I am ready to start immediately and will follow all instructions carefully. My rate is $5 for this task. I will start only after milestone is created. Thank you.
$43 USD trong 6 ngày
0,0
0,0

As an experienced Full Stack Web and Mobile App Developer, my solid understanding of database development, particularly in SQL-based environments, makes me a perfect fit for your SQL Canonical Dataset Build project. Over my 10-year career, I’ve successfully completed over 2000 projects across various industries, utilizing languages including PostgreSQL and SQL - which are recommended for this project. My expertise lies in creating performance-optimized, user-centric solutions that adhere to client-specific needs while ensuring data accuracy and integrity. Besides handling large datasets and deterministic aggregations you require, I recently completed a similar but even bigger project for a healthcare organization where I delivered a fully normalized database by efficiently handling data deduplication without relying on qualitative judgment that can results in discrepancies. The solution used sturdy aggregations based on data identities (similar to UEI) which I will replicate in your project. To answer your scenario query regarding missing transaction rows without compromising the naming convention, I would flag these ambiguous rows as exceptions as per your requirement. While not deduping by name will maintain the integrity of your dataset, by retaining these rows safely in an exception table with a clear indication, it serves two purposes; it maintains high-quality data and any potential impacts due to missing UEIs would be known and manageable.
$45 USD trong 2 ngày
0,0
0,0

SQL dialects: PostgreSQL (primary), BigQuery, MySQL. Strong on deterministic ETL: staged loads, strict keys, GROUP BY aggregation, left-join enrichment, and re-runnable script builds with QA tie-outs. Example: Built a canonical “one-row-per-entity” dataset from transactional tables using UEI-like keys, enforced uniqueness with constraints/QA queries, reconciled totals to source, then enriched via exact-key left joins (no fuzzy) and produced deterministic CSV outputs + exceptions tables for bad keys. Scenario: If 5–10% rows lack UEI, I do NOT infer by name. I route those rows into an exceptions table with counts and obligation totals, exclude them from UEI-anchored aggregation, and report impact on reconciliation so you can decide upstream remediation or rule updates.
$45 USD trong 7 ngày
0,0
0,0

Hello there,, I have advanced experience in Data Mining, Statistics, Statistical Analysis and Data Science. With my vast background in data analysis and management, I am confident in my ability to handle your categorical data project effectively and efficiently. I have extensive experience in collecting, cleaning, analyzing, and visualizing data using Python programming, an invaluable asset for a project of this nature. Additionally, I am well-versed with CRISP-DM framework and adept at identifying patterns within datasets Choosing me means benefitting from not only my expertise but also my personal approach to projects. I understand that each task is unique, requiring tailored skills, and so I'm willing to go the extra mile to provide you with results that meet and exceed your expectations. Let's join forces in this project as our combined strengths will surely produce a result that's efficient, elegant and insightful! Let's not waste any more time! Together, we can mine this data efficiently and answer the questions to achieve your goals. Best Regards, Thanks
$10 USD trong 1 ngày
0,0
0,0

⚠️ IF YOU'RE NOT HAPPY YOU DON’T PAY ⚠️ I have built vendor-level canonical datasets using strict keys, GROUP BY aggregation, and controlled LEFT JOIN enrichment with exception tables for missing identifiers. Determinism and reconciliation checks are always part of my workflow. Scenario answer: I isolate the 5–10% missing UEI rows into an exceptions table, flag them, and halt enrichment. Inferring by name violates identity rules and breaks determinism. I would love to chat about your project! The worst that can happen is you walk away with a free consultation. Regards, Aqeel
$10 USD trong 1 ngày
0,0
0,0

Hello Madam, Thanks for your invitation. Looking forward to discussing more details. Best regards, Cam
$100 USD trong 7 ngày
0,0
0,0

Hello, I have solid experience with SQL Server and PostgreSQL-style queries, focused on deterministic aggregation and key-based joins. I understand the requirement: build a canonical one-row-per-UEI dataset (L24M), apply Segment B & C exactly as written, perform QA tie-outs, then enrich strictly via exact UEI left joins (no fuzzy matching). If 5–10% of rows are missing UEI, I would isolate them into an exceptions table and not infer by name, since UEI is the defined identity anchor. I prioritize correctness, reproducibility, and clear communication if data conflicts arise. Proposed budget: $60.
$60 USD trong 5 ngày
0,0
0,0

Delhi, India
Phương thức thanh toán đã xác thực
Thành viên từ thg 11 29, 2023
₹12500-37500 INR
₹12500-37500 INR
$10-30 USD
₹1500-12500 INR
$30-250 USD
₹600-1500 INR
₹750-1250 INR/ giờ
₹1500-12500 INR
tối thiểu 50 USD$/ giờ
₹37500-75000 INR
₹12500-37500 INR
$30-450 NZD
₹75000-150000 INR
₹600-1500 INR
₹37500-75000 INR
$25-50 USD/ giờ
$30-250 USD
$10-11 USD
$2-8 USD/ giờ
$2-8 USD/ giờ
₹12500-37500 INR
$25-50 USD/ giờ
₹750-1250 INR/ giờ
₹600-1500 INR
$10-30 USD