Software Engineer

San Francisco, CA∙Full-time

Datacurve provides the frontier coding data that powers the world's most advanced models. We absorb and standardize deeply, highly-specialized knowledge to create the world's first autonomous data engine, allowing us to teach the next generation of models (big and small) mastery across all types of knowledge work. We work with foundational labs and large, highly-specialized enterprises alike.

About the Role

As a Software Engineer at Datacurve, you will own the architectural bedrock of our data engine. You'll build the foundational systems that turn complex, one-off data workflows into highly scalable, deterministic machines. This means designing, building, and scaling the data acquisition and enrichment infrastructure that makes our data unlike anything else on the market — massive volumes of it. You'll work on turning knowledge previously reserved for domain experts into standardized, structured inputs for the next generation frontier models.

You might be a fit if you're incredibly high-agency, deeply curious, and able to draw insights from a wide range of technical experience. If you're a generalist who simultaneously has developed an intense specialty in architecting large, data-intensive systems, then this might be the right type of work for you. You should have (or be able to develop) intuition for what makes model performance better, then be able to map that to how you design the infrastructure you own.

Your work will materially influence the velocity of frontier model improvement within the world's leading AI labs — improvement capable of causing notable shifts in the global economy.

What we're looking for

Systems thinker: You’re able to think about problems, no matter how big or in the weeds, abstractly — somehow making them simple. You have a proven track record of designing and scaling robust infrastructure and data pipelines under heavy loads. You’re able to whiteboard a problem’s lifecycle
Product intuition: Designing great systems requires understanding the product primitives worth building for, in service of completely independent products responsible for their own growth, retention, and value proposition to their users
Relentless — borderline obsessive — curiosity: You're motivated to push the boundaries of what machines are capable of, in ways not yet consensus
Obsession with reliability: High engineering standards and an innate desire to build tooling that is self-healing, deterministic, and highly legible
Deeply independent — almost iconoclastic — thinker: You hold strong opinions, but understand the limits of your knowledge; you lean toward questioning consensus rather than following it

What you'll do

Instantiate entirely new training environments for language models, then construct the data acquisition pipelines capable of producing them at scale.
Design, build, and maintain parts of the internal Datacurve platform that allows researchers to extract the most value out of the data we produce.
Own the reliability, performance, and monitoring of our data engine to ensure zero-fault data delivery.
Work side-by-side with researchers on some of the hardest problems stunting model improvement.

How to Apply

To apply, please email careers@datacurve.ai with your resume/GitHub/LinkedIn and a few sentences about why you’re exceptional and why the work we do excites you. We love cool projects, deep-dive writings, and unconventional backgrounds.

Software Engineer

How to Apply

Sound like you?