A food delivery copilot is useful because the hard step often happens before checkout: deciding. In this pilot, we built an agent-first experience for the moment when a user has a narrow window to eat and no attention for browsing. It confirms constraints, proposes a small set of options, places the order, then monitors timing until handoff.
This case study is for teams building a food delivery app or marketplace, plus anyone shipping agent UX that must execute real transactions. The core lesson is simple: if the experience does not reduce mental load and set honest expectations, the architecture will fail in the moments that matter.
Food delivery apps are designed for browsing, yet browsing collapses under time pressure. When someone has back-to-back meetings or a narrow calendar gap, scrolling through dozens of restaurants becomes work. The user starts outsourcing the decision to whoever is nearby - or they default to something they do not even want.
We treated that moment as the primary product surface. We designed a decision path that can finish quickly, while still respecting constraints like “no meat today” and budget limits. That choice also acknowledges a daily reality: cravings change faster than preference models.
The copilot is a chat-like experience with strong guardrails. It does not try to be a general assistant. Its job is to get from intent to a placed order with the fewest possible questions, then keep the user informed if reality shifts (items unavailable, ETA changes, restaurant delays).
At a high level, the experience has three phases:
- Decide: narrow to a short list using the user profile and explicit constraints.
- Commit: build the cart and confirm the details that are expensive to get wrong (location, time window, fees, substitutions).
- Deliver: monitor the order and surface only the decisions that require a human.
That structure matters because it lets the agent act like a teammate, while still keeping the user in control of irreversible steps.
In food delivery, “personalization” breaks when it behaves like mind reading. People have dietary constraints, they get tired of their usual picks, and sometimes they want a meal that matches a mood rather than a history. The copilot follows a strict rule for commitment decisions: it can use history to propose, yet it waits for a clear constraint signal before it commits.
Concretely, we combined three inputs:
- A lightweight profile: repeats, disliked ingredients, preferred cuisines, typical spend bands, and past ratings.
- A constraint layer: “vegetarian,” “no chicken,” “under $20,” “light meal,” “delivery during my next break.”
- A provider reality check: what is actually available right now, at which location, with what ETA.
Then we made the copilot “show its work” in plain language. Each recommendation carries a reason a human can sanity-check in two seconds. The rationale stays simple and skimmable: distance, expected prep time, a match to a high-rated pattern, and which constraints it satisfied. That transparency is what keeps a food delivery agent from feeling like a slot machine.
One small UX pattern carried a lot of weight: time windows are treated as a first-class constraint, alongside diet and budget. A user can say, “I can eat between 2:00 and 2:45, keep it vegetarian, under $20,” and the copilot will respond with three options that match the profile, then ask only what is missing to place the order.
Exact-time delivery is a UX promise that can turn into a support nightmare. In driver-side discussions around scheduled orders, one practical constraint shows up repeatedly: couriers often do not see the user’s scheduled intent. Platforms can also dispatch “scheduled” orders early, so the system starts fighting itself before the meal is even cooked.
We handled this by treating timing as an uncertainty-managed workflow. The copilot asks for a target time, then responds with one of three contracts:
- High confidence: place the order now and monitor drift against the promised window.
- Best-effort window: place the order with a wider delivery range and alert the user if the ETA moves outside it.
- Cannot meet the window: propose a faster option or a different provider before any checkout steps.
When confidence is low, the experience offers alternatives that keep the user fed: shift the window, choose a faster cuisine, or pick a different provider. That framing kept the product honest, and it also made the agent more useful, because it could steer the user away from orders that would predictably arrive too early or too late.
We built the copilot as a full stack .NET application (C#) with an agent core that can safely call tools. The LLM runs on Azure OpenAI, where responses are grounded by tool outputs instead of invented details. Orchestration is handled with Semantic Kernel, and long-lived agent state is managed with Microsoft Orleans so each session behaves consistently across retries and background monitoring.
The important design choice was to keep “agent intelligence” close to real capabilities. We exposed provider operations as explicit tools: search menus, build a cart, compute fees, place an order, and poll status. Then the orchestrator plans within a narrow sandbox, which keeps the model from freelancing and makes failures recoverable.
Ordering is transactional, so “close enough” is not good enough. The copilot can move quickly during the decide phase. It slows down during commit because mistakes become expensive: wrong restaurant location, wrong address, wrong delivery window, or a surprise fee at checkout.
We used a few simple safeguards that work well with agent systems:
- Explicit confirmations for the irreversible fields (delivery address, restaurant location, time window, total cost).
- Idempotent tool calls and a session ledger so retries do not duplicate orders.
- Substitution policy prompts that are easy to answer once and reuse (“if out of stock, replace with X, or ask me”).
- Monitoring as a background responsibility, with alerts only when a choice is required.
We chose these controls for user experience under stress, with safety and correctness as direct consequences. When the user is in a meeting, they cannot debug a cart, so the system has to prevent the failure modes that turn a copilot into another chore.
For this pilot, we avoided publishing invented impact metrics. Instead, we defined a measurement plan that matches how agent experiences succeed or fail in the real world: speed, correctness, and on-time behavior under uncertainty.
The metrics that mattered most, with concrete triggers:
- Time-to-order: median <= 120 seconds over 50+ sessions in 2 weeks; if it exceeds 180 seconds, reduce choices per turn and tighten constraint prompts.
- On-time delivery: >= 85% within the promised window over 100+ orders in 30 days; if it falls below, widen the window and surface reason codes (prep vs dispatch vs traffic).
- Wrong-order prevention: <= 1% of agent-placed orders require cancellation or re-order; if it rises, add one more confirmation step before placing the order.
- Recommendation usefulness: >= 4/5 average “matched what I wanted” rating over 30 days; if it drops, ask one extra context question (“craving”, “light/heavy”, “spicy”) before proposing options.
Those thresholds kept the work grounded. They also made iteration easier, because each UX change could be evaluated as a trade between fewer questions, fewer mistakes, and better timing.
The next product step is to expand the copilot’s reliability envelope without increasing user burden. That work looks boring on purpose: deeper provider integrations for better prep estimates and clearer schedule semantics. It also means calendar-aware time windows, so “deliver at 2:00” becomes a monitored contract rather than a wish. Personalization then adapts to context shifts instead of repeating yesterday’s “best match.”
This pilot also became one of the stories that shaped AiBase - our productized approach to building agent experiences where the experience design matters as much as the model. The takeaway is transferable: if an agent is supposed to execute, the UX has to carry uncertainty and confirmations. Otherwise the user ends up doing the hard work again, only now it happens under higher stakes.