AIBase in Action: Food Delivery Copilot

How we built an agent-first food delivery copilot that helps people decide what to eat and schedule delivery to a precise time window.

Overview

A food delivery copilot is useful because the hard step often happens before checkout: deciding. In this pilot, we built an agent-first experience for the moment when a user has a narrow window to eat and no attention for browsing. It confirms constraints, proposes a small set of options, places the order, then monitors timing until handoff.

This case study is for teams building a food delivery app or marketplace, plus anyone shipping agent UX that must execute real transactions. The core lesson is simple: if the experience does not reduce mental load and set honest expectations, the architecture will fail in the moments that matter.

Why is choosing food harder than ordering it?

Food delivery apps are designed for browsing, yet browsing collapses under time pressure. When someone has back-to-back meetings or a narrow calendar gap, scrolling through dozens of restaurants becomes work. The user starts outsourcing the decision to whoever is nearby - or they default to something they do not even want.

We treated that moment as the primary product surface. We designed a decision path that can finish quickly, while still respecting constraints like “no meat today” and budget limits. That choice also acknowledges a daily reality: cravings change faster than preference models.

What we built: a food delivery copilot that can place the order

The copilot is a chat-like experience with strong guardrails. It does not try to be a general assistant. Its job is to get from intent to a placed order with the fewest possible questions, then keep the user informed if reality shifts (items unavailable, ETA changes, restaurant delays).

At a high level, the experience has three phases:
- Decide: narrow to a short list using the user profile and explicit constraints.
- Commit: build the cart and confirm the details that are expensive to get wrong (location, time window, fees, substitutions).
- Deliver: monitor the order and surface only the decisions that require a human.

That structure matters because it lets the agent act like a teammate, while still keeping the user in control of irreversible steps.

Food delivery copilot user flow showing decide, commit, and deliver phases

How the copilot decides (without guessing wrong)

In food delivery, “personalization” breaks when it behaves like mind reading. People have dietary constraints, they get tired of their usual picks, and sometimes they want a meal that matches a mood rather than a history. The copilot follows a strict rule for commitment decisions: it can use history to propose, yet it waits for a clear constraint signal before it commits.

Concretely, we combined three inputs:
- A lightweight profile: repeats, disliked ingredients, preferred cuisines, typical spend bands, and past ratings.
- A constraint layer: “vegetarian,” “no chicken,” “under $20,” “light meal,” “delivery during my next break.”
- A provider reality check: what is actually available right now, at which location, with what ETA.

Then we made the copilot “show its work” in plain language. Each recommendation carries a reason a human can sanity-check in two seconds. The rationale stays simple and skimmable: distance, expected prep time, a match to a high-rated pattern, and which constraints it satisfied. That transparency is what keeps a food delivery agent from feeling like a slot machine.

One small UX pattern carried a lot of weight: time windows are treated as a first-class constraint, alongside diet and budget. A user can say, “I can eat between 2:00 and 2:45, keep it vegetarian, under $20,” and the copilot will respond with three options that match the profile, then ask only what is missing to place the order.

Can you really deliver at an exact time?

Exact-time delivery is a UX promise that can turn into a support nightmare. In driver-side discussions around scheduled orders, one practical constraint shows up repeatedly: couriers often do not see the user’s scheduled intent. Platforms can also dispatch “scheduled” orders early, so the system starts fighting itself before the meal is even cooked.

We handled this by treating timing as an uncertainty-managed workflow. The copilot asks for a target time, then responds with one of three contracts:
- High confidence: place the order now and monitor drift against the promised window.
- Best-effort window: place the order with a wider delivery range and alert the user if the ETA moves outside it.
- Cannot meet the window: propose a faster option or a different provider before any checkout steps.

When confidence is low, the experience offers alternatives that keep the user fed: shift the window, choose a faster cuisine, or pick a different provider. That framing kept the product honest, and it also made the agent more useful, because it could steer the user away from orders that would predictably arrive too early or too late.

Architecture: Semantic Kernel + Azure OpenAI + Orleans in .NET

We built the copilot as a full stack .NET application (C#) with an agent core that can safely call tools. The LLM runs on Azure OpenAI, where responses are grounded by tool outputs instead of invented details. Orchestration is handled with Semantic Kernel, and long-lived agent state is managed with Microsoft Orleans so each session behaves consistently across retries and background monitoring.

The important design choice was to keep “agent intelligence” close to real capabilities. We exposed provider operations as explicit tools: search menus, build a cart, compute fees, place an order, and poll status. Then the orchestrator plans within a narrow sandbox, which keeps the model from freelancing and makes failures recoverable.

System architecture diagram showing Semantic Kernel orchestration, Orleans state, Azure OpenAI, and provider connectors

Reliability and safety: confirming intent, payments, and substitutions

Ordering is transactional, so “close enough” is not good enough. The copilot can move quickly during the decide phase. It slows down during commit because mistakes become expensive: wrong restaurant location, wrong address, wrong delivery window, or a surprise fee at checkout.

We used a few simple safeguards that work well with agent systems:
- Explicit confirmations for the irreversible fields (delivery address, restaurant location, time window, total cost).
- Idempotent tool calls and a session ledger so retries do not duplicate orders.
- Substitution policy prompts that are easy to answer once and reuse (“if out of stock, replace with X, or ask me”).
- Monitoring as a background responsibility, with alerts only when a choice is required.

We chose these controls for user experience under stress, with safety and correctness as direct consequences. When the user is in a meeting, they cannot debug a cart, so the system has to prevent the failure modes that turn a copilot into another chore.

What changed for the team (and how to measure it)

For this pilot, we avoided publishing invented impact metrics. Instead, we defined a measurement plan that matches how agent experiences succeed or fail in the real world: speed, correctness, and on-time behavior under uncertainty.

The metrics that mattered most, with concrete triggers:
- Time-to-order: median <= 120 seconds over 50+ sessions in 2 weeks; if it exceeds 180 seconds, reduce choices per turn and tighten constraint prompts.
- On-time delivery: >= 85% within the promised window over 100+ orders in 30 days; if it falls below, widen the window and surface reason codes (prep vs dispatch vs traffic).
- Wrong-order prevention: <= 1% of agent-placed orders require cancellation or re-order; if it rises, add one more confirmation step before placing the order.
- Recommendation usefulness: >= 4/5 average “matched what I wanted” rating over 30 days; if it drops, ask one extra context question (“craving”, “light/heavy”, “spicy”) before proposing options.

Those thresholds kept the work grounded. They also made iteration easier, because each UX change could be evaluated as a trade between fewer questions, fewer mistakes, and better timing.

Next steps

The next product step is to expand the copilot’s reliability envelope without increasing user burden. That work looks boring on purpose: deeper provider integrations for better prep estimates and clearer schedule semantics. It also means calendar-aware time windows, so “deliver at 2:00” becomes a monitored contract rather than a wish. Personalization then adapts to context shifts instead of repeating yesterday’s “best match.”

This pilot also became one of the stories that shaped AiBase - our productized approach to building agent experiences where the experience design matters as much as the model. The takeaway is transferable: if an agent is supposed to execute, the UX has to carry uncertainty and confirmations. Otherwise the user ends up doing the hard work again, only now it happens under higher stakes.

Richard Mueller
Founder, Restaurant Service Startup

It took the Managed Code team five months to build the application, as initially planned. The app that Managed Code developed runs smoothly, is highly rated by users, and helps the client generate a steady profit. The team was highly communicative, and internal stakeholders were particularly impressed with Managed Code's expertise.

(01)
Vitalii Drach
CEO, RD2

Their professionalism and commitment to delivering high-quality solutions made the collaboration highly successful.
Thanks to Managed Code's efforts, the AI assistant significantly improved the client's ability to serve new and existing clients, resulting in increased customer satisfaction and higher sales. The team was responsive, adaptable, and committed to excellence, ensuring a successful collaboration

(02)
Christopher Mecham
CTO, Legal Firm

We're impressed by their expertise and their client-focused work.
With an excellent workflow and transparent communication on Google Meet, email, and WhatsApp, Managed Code delivered just what the client wanted. They effortlessly focused on the client's needs by being client focused, as well.

(03)