Mock Data Gen with Machine Learning Module - 01/12/2023 02:56 EST

  • Tình trạng: Closed
  • Giải thưởng: $500
  • Các bài thi đã nhận: 1
  • Người chiến thắng: td7x

Tóm tắt cuộc thi

*Summary*

This is a software engineering contest that leverages machine learning to solve developer experience inconveniences in creating mock data for testing and for demos within the JavaScript ecosystem.

Winning submissions will include a GitHub repo of the software, complete with documentation and CICD using GitHub Workflows.

Employer reserves all rights to the software created under this contest but will redistribute the software under an Open Source license. All dependencies must have permissive OSI approved licenses and the software must be runnable offline, without dependence on an external web service or datastore and without dependency on specialized hardware.

*Problem*

Simple faker or charade libraries can be used for mock data in software development but the use can be labor intensive because they require a developer to select the correct method and to identify the input parameters for each data field. Developers have enough cognitive overhead and need a fake data solution that can use existing data models/schemas with zero configuration to create the fake data.

*Solution*

A NodeJS module that produces semantically accurate fake data from an arbitrary data model or schema with zero configuration. We are primarily a Typescript/NodeJS shop and describe the requirements from that perspective but welcome submissions that are Rust based and that compile to WASM are more than welcomed. Runtime portability such as in-browser, Bun, Cloudflare, etc is preferred but NodeJS is required.

Data model handlers for GraphQL SDL and JSONSchema are required. Extra preference will be given to submissions with additional handlers for TypeScript type definitions and protobufs.

Various fake data handlers should be supported. Required is a handler that accepts a single field name from the data model and returns semantically correct mock data consistent with the larger data model. Extra preference will be given to submissions with additional handlers that accept a GraphQL request shape (returning a GraphQL response shape) and a handler that does not accept an argument and returns an object for the data model (that could be stringified into JSON).

It is expected that this software will utilize existing generators such as FakerJs, ChanceJs, CasualJs and RandExpJs just as other higher level tools do:

- https://github.com/json-schema-faker/json-schema-faker
- https://github.com/MedAli5543/graphql-fake-data-generator
- https://github.com/danibram/mocker-data-generator

Unlike these existing tools, this software will not statically code and thus limit itself to individual basic field types and require significant configuration for non-basic field types. How we overcome this limit is the crux of what makes this software different. Perhaps NLP string or vector comparisons can be used to select the correct generator function from the field name with only unmatched requests using an LLM. LangChain seems like a quite attractive pattern and tech for this.



*Code Standards*

Code will be written in strict TypeScript with strong typing and be compatible with Bun, Deno, and NodeJS. Code will be "Clean" and robust. OOP patterns are to be avoided in favor of "strategic" functional programming use. eslint-plugin-functional/recommended is great, using additional fp libs such as fp-ts or Ramda is not required. In general:
- Small composable functions.
- No nested code.
- Avoid if statements. Branches are only ok in the simplest and unavoidable use cases. Simple clean ternaries are fine.
- Along with avoiding branching, absolutely no try/catch.
- Never throw.
- No control loops.
- No unbounded iterators.
- Use maps rather than a switch or if/else.
- Functions should be small, pure, and composable.
- Separate configuration from code.
- Use arrow function syntax.
- Avoid async/await as one can accidentally block the event loop.

*Testing*

Fine grain testing of LangChain does not seem completely straight forward but there are current improvements to its testability and the LangSmith debugger should probably be used. Code should be decoupled so that mocks can be avoided. Vitest or Jest should be with fast-check as well as static assertions. Strict TDD is not required but preferred. Writing tests through the development and not at the end is required. The important thing is that testable code is cleaner, simpler, more robust. Tested code is easier to change.

The test suit should also prove the software works.

Các kĩ năng yêu cầu

Bảng thông báo công khai

  • farhankha4548
    farhankha4548
    • cách đây 2 tháng

    I have ready your code and updated full functions but you have awarded someone

    • cách đây 2 tháng
  • tokibul2
    tokibul2
    • cách đây 2 tháng

    Hi,
    Do you know freelancer.com? Also, Do you know they are scammer?

    I earned 1000 GBP and 200+ USD by providing my service on this platform. But when I requested a payment withdrawal they closed my account. Blocked me and I couldn't chat or create any ticket.

    So, I created this account for help me to get my account balance in my bank account.

    what do you think about this scammer (freelancer.com) giving me my earnings in my account?

    [ They will just block this account. Because this is their only way of earning by taking hard-working payment from poor freelancers. In my words, they are a Beggar. ]

    Check this screenshot for more : https://drive.google.com/drive/folders/1tKtg5TC4-_6q_uG73rHNmUNhezqkRiaC?usp=sharing

    • cách đây 2 tháng
  • farhankha4548
    farhankha4548
    • cách đây 2 tháng

    I am working in rust to provide your a better and best solution and I will also show you demo video also

    • cách đây 2 tháng
  • farhankha4548
    farhankha4548
    • cách đây 2 tháng

    Hello, sir Is is good for you in node.js or RUST?
    What is preferred by you?
    I can also provide you in RUST if you want?

    • cách đây 2 tháng
    1. dutco7
      Chủ cuộc thi
      • cách đây 2 tháng

      A Rust solution would be great. It just needs to be able to run in BunJS and CloudFlare. WASI direction could be good, wasm-pack could help.

      https://github.com/scrippt-tech/orca and https://github.com/huggingface/candle are quite interesting.

      • cách đây 2 tháng
  • dataexpert18
    dataexpert18
    • cách đây 3 tháng

    Can you explain on which data you want to apply machine learning and what outcome you expect from machine learning?

    • cách đây 3 tháng
    1. dutco7
      Chủ cuộc thi
      • cách đây 2 tháng

      Hello Zafar, Im not sure how to explain it better than in the description. The generative model may need to use the data model for fine tuning or perhaps zero shot would work. The mock data gen function will accept a field name and return the semantically correct, generated data.

      • cách đây 2 tháng

Xem thêm bình luận

Làm thế nào để bắt đầu với cuộc thi

  • Đăng cuộc thi của bạn

    Đăng cuộc thi của bạn Nhanh chóng và dễ dàng

  • Nhận được vô số bài dự thi

    Nhận được vô số Bài dự thi Từ khắp nơi trên thế giới

  • Trao giải cho bài thi xuất sắc nhất

    Trao giải cho bài thi xuất sắc nhất Download File - Đơn giản!

Đăng cuộc thi ngay hoặc tham gia với chúng tôi ngay hôm nay!