Cooklist is the AI grocery‑intelligence platform powering meal planning and shopping for millions of consumers across our consumer app and white‑label enterprise suite. Our mission is to combine the intelligence of a personal shopper, chef, and nutritionist to help people save time, eat better, and enjoy happier lives.

We’re profitable, process billions of dollars in transactions at the nation’s largest retailers, and our mobile experiences reach millions of people. We’re backed by Techstars, Mercury Fund, and industry leaders including the former CTO of Kroger and the Chief Product Officer of Amazon Fresh.

Role Overview

We’re hiring a Senior AI Engineer (LLMs & Agents) to design and operationalize the intelligence behind Cooklist’s AI grocery shopping assistant that is embedded in top US retailers and the Cooklist app. Your primary job: own the evals, reliability, and workflow architecture that make an agentic system trustworthy at scale.

You’ll design prompts, tool‑calling strategies, retrieval pipelines, and safety guardrails; build the eval harness and real‑time monitoring systems; and drive model/latency tradeoffs that turn demos into production‑grade performance. This is a founding‑level role on a tiny team where your decisions directly impact millions of shoppers and where accuracy around allergens and nutrition is critical.

We are an AI‑leveraged org: our question is always “How do we use AI to build AI?” You’ll set patterns, tests, and guardrails that allow AI assistants to safely contribute to the codebase, compounding our output.

Responsibilities

Own LLM reliability end‑to‑end: architect prompts, tools, and reasoning workflows that meet strict accuracy, safety, and latency requirements.
Design robust evals: build offline/online eval suites for structured output, factuality, grounding, allergen sensitivity, and user‑goal attainment; define gold sets, synthetic data pipelines, and automatic failure taxonomies.
Productionize agent workflows: retrieval‑augmented generation, tool calling over GraphQL/WebSockets, function/tool schemas, and strict JSON output contracts.
Model strategy: evaluate and deploy model mixes (reasoning vs. fast paths), caching strategies, and guardrails to balance quality, latency, and cost.
Monitoring & observability: ship real‑time conversation analytics, drift detection, canary/shadow testing, incident taxonomies, and auto‑triage for misbehavior.
Safety & compliance: encode domain policies for allergens, dietary restrictions, and nutrition; implement red‑team tests, constraints, and fallbacks.
Tight product loop: partner with mobile/backend to ship agent features, collect outcome‑level telemetry, and iterate quickly (“build first, refine fast”).
Scale the system: create libraries, prompts, schemas, and tests that let AI coding assistants contribute safely; document playbooks and upgrade paths.

Qualifications

You’ve shipped LLM systems to production with real user impact. Ideally agentic loops, tool calling, and structured outputs at scale.
You’re fluent in Python and have built eval harnesses, automated datasets, and dashboards for LLM quality.
You’ve implemented RAG (indexing, chunking, embeddings, reranking) and understand failure modes (hallucination, grounding, duplication, drift).
You can design and enforce strict schemas, guarantee parseability, and create deterministic fallbacks.
You’re comfortable making model tradeoffs (reasoning models vs. smaller/cheaper paths; latency budgets; cost controls) and can prove the impact.
You care about safety (allergens, dietary needs, policy adherence) and can translate product risk into tests, gates, and roll‑out controls.
You move with founder energy: high ownership, high bar for polish, gritty, and calm under production pressure.

Our Stack

Language: Python/Django backend; Javascript/React Native frontend
APIs/Data: GraphQL; real‑time streaming over WebSockets
Mobile: React Native (close collaboration with the mobile team)
LLM engineering: internally built prompt/tool libraries, RAG pipelines & eval system

What We Offer

Competitive compensation + meaningful equity
Austin, TX based with WFH flexibility
Work directly with founders and an elite, tight‑knit team
Ship experiences that materially improve the lives of millions
A high‑intensity, high‑ownership environment designed for builders

Apply now

See more open positions at Cooklist

Privacy policy Cookie policy