Azure AI Search RAG Internal Copilot for SKU & Translation Management

Built a citation-first internal copilot on Azure AI Search (RAG) to help a global distributor answer SKU and translation questions in seconds, with sources teams can verify.

When a team manages thousands of SKUs across many markets, a "small" translation question rarely stays small. The answer is usually buried inside a policy PDF, a supplier specification, an exception clause in a document nobody can remember, and the one screenshot that lives in a chat thread. If finding the source takes longer than the decision itself, people start guessing, and the cost shows up later as rework, inconsistent listings, and internal disputes.

This case study is about building an Azure AI Search RAG internal documentation assistant that turns those fragmented documents into something teams can search by meaning, confirm by citation, and use under deadline pressure.

Overview

We built an internal copilot for a global distributor of home improvement and sanitary products. The business problem was straightforward: SKU and translation requests kept getting stuck in internal back-and-forth because the "correct" answer existed in documentation, yet manual search was slow and inconsistent across markets.

The turning point was treating citations as a product requirement, not a nice-to-have. Every answer had to point back to the exact source document and chunk it came from. That one constraint changed adoption, because teams could verify the answer in seconds and move on.

This approach is a good fit when you have a large catalog, many markets, and repeated questions that depend on written rules. If your catalog is small and the rules rarely change, a better play is usually process: consolidate documentation and tighten ownership before building a retrieval platform.

Diagram of the internal copilot RAG architecture on Azure AI Search, from document ingestion to cited answers

The pressure: when translation answers become a bottleneck

The client operates at a scale where translation is a workflow, not a one-off. They manage 15,000+ SKUs, work across 30+ markets, and have 500+ employees. In this environment, translation requests and SKU questions arrive from multiple teams at once, often with incomplete context and tight timelines.

Before the copilot, handling one internal request took an average of 40 minutes. The delay was not about one slow person; it was about a slow system. Knowledge was distributed across documents, markets used different wording, and "search" failed when the question started as an unclear fragment.

That friction bled into business outcomes. Product exhibitions and marketplace listings have real dates, and translation ambiguity becomes a blocker when answers cannot be found quickly and backed by a source.

Why didn’t keyword search work?

Keyword search works when people remember the right terms. Translation and SKU workflows break that assumption. In practice, internal questions look like:
- a half-remembered phrase from a spec sheet
- a SKU or part number with an inconsistent naming convention
- a market-specific rule that only appears in one PDF
- a question asked in ordinary language that never matches the document’s wording

Embedding search helps because it can retrieve by meaning, even when the query is messy. At the same time, SKU and code-heavy queries still benefit from lexical signals. That is why our retrieval design leaned on Azure AI Search capabilities that support both vector and keyword matching, then merge results for better recall.

Once recall improved, we could make ranking and citations do their job: bring the best sources to the top and keep the answer grounded in something the user can open.

Architecture: Azure AI Search RAG with citations

The system has one job: take internal documents and turn them into answers that are fast, accurate, and traceable. The architecture is simple to explain and strict in execution.

Documents are ingested, split into chunks, embedded, and indexed in Azure AI Search for vector retrieval. At question time, the copilot retrieves the best-matching chunks, then generates an answer that includes citations to those chunks. The user can open the source immediately to confirm details, resolve ambiguity, and keep the conversation moving.

On the implementation side, we used Semantic Kernel RAG orchestration in .NET, and Kernel Memory RAG patterns to standardize ingestion, chunking, and citation-first retrieval. The service was deployed inside the client’s internal infrastructure to fit their operational and security constraints.

Retrieval design decisions that moved the needle

RAG systems usually fail in retrieval, then look like a model problem. We treated retrieval as the product, and generation as the final formatting step.

Chunking: split for retrieval, not for storage

Chunking decides what the system can retrieve. Too large, and you drag irrelevant text into the context window. Too small, and you lose the clause that changes the meaning. We treated vector search chunking as a quality lever, splitting documentation into clean, readable units that preserve local meaning and keep exceptions near their definitions, then iterating based on real internal queries.

Hybrid retrieval: support both code-like and natural-language queries

Users mix SKUs and policy language in the same day. For code-heavy lookups, keyword matching matters. For messy, fragment-based queries, embeddings matter. Azure AI Search supports hybrid retrieval patterns that combine these signals so recall stays high without forcing users to "learn how to query".

In other words, Azure AI Search hybrid search protects you from two common failure modes at once: missing the right paragraph because the query was phrased differently, and missing it because the query was mostly SKU codes.

Metadata and filters: keep answers scoped to the right market

Multi-market translation rules create a common failure mode: the system retrieves a correct answer for the wrong market. We treated market, language, and document type as first-class metadata and used retrieval-time filters so results stay inside the user’s scope.

Citations: design for verification and adoption

Citations were the adoption lever. We treated RAG citations as a product requirement. When a user can open the source in one click, the copilot becomes a decision tool instead of a chatbot. It also creates a feedback loop: when an answer is wrong, the team can see whether the failure came from retrieval, document quality, or an outdated source.

Results

With the new copilot in place, the client reported:
- 70% of internal queries handled automatically
- average response time reduced from 40 minutes to 6 seconds
- internal search time reduced by 70%
- time needed for product exhibitions and marketplace listings reduced by 30%

These results came from a combination of retrieval quality and trust. The system could find relevant documents from unclear fragments, and citations made answers usable in real workflows where people need to validate before acting.

Is this approach worth it for your team? (Triggers + measurement plan)

If you are deciding whether to build a similar system, start with a measurement pass. Treat it as a retrieval problem first, then decide whether you need a platform.

Use a 20-business-day observation window and capture two numbers for the requests you already handle:
- time-to-answer (from question received to a confirmed answer)
- repeat rate (how often the same question returns, even with slightly different wording)

Action triggers that justify a pilot:
- your average time-to-answer is around 40 minutes per request, or higher
- you are supporting roughly 10 markets where the correct answer depends on local rules or wording
- you manage a catalog on the order of 15,000 SKUs, where tribal knowledge breaks down and policy exceptions matter

If your measurements land below those triggers, a smaller intervention often wins first: rewrite key documents for findability, standardize naming, and tighten ownership. If they land above, retrieval work has a clear path to ROI, and citations keep that ROI from being eaten by trust issues.

What we’d improve next

Once a citation-first copilot is live, the next gains come from evaluation and operations, not from bigger prompts.

Priorities we typically push next:
- build a retrieval test set from real questions and track recall over time
- add index refresh discipline so citations stay aligned with the latest documents
- tighten market and role-based scoping as the user base grows
- expand multilingual handling where translation policies and source docs mix languages

These improvements keep the system stable as it scales. They also make failures easier to diagnose, because the team can tell whether a regression came from retrieval, from the underlying documents, or from a change in how questions are phrased.

Stack

We kept the stack intentionally small. Each component exists for a clear retrieval job, and the system is only as strong as the link between retrieval quality and citation quality.

- Azure AI Search - Azure AI Search vector search, Azure AI Search hybrid search, and filtering
- Microsoft Semantic Kernel - orchestration for the RAG workflow
- Kernel Memory - ingestion, chunking, and citation-first retrieval patterns
- .NET / C# - service implementation

If you remove any one of these, you either lose recall (retrieval quality), lose governance (scoping and filters), or lose trust (citations). The "copilot" experience depends on all three working together.

Sources (external)

These references describe the retrieval patterns behind the implementation and the citation approach used in the copilot.

- Azure AI Search hybrid search overview (https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview)
- Semantic Kernel agentic RAG with Azure AI Search (https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-rag)
- Kernel Memory citations and trust (https://github.com/microsoft/kernel-memory/wiki/Citations-and-Trust)

If you want to go deeper, these docs are the clearest place to start because they describe retrieval mechanics and citation patterns without turning the implementation into a black box.

Richard Mueller
Founder, Restaurant Service Startup

It took the Managed Code team five months to build the application, as initially planned. The app that Managed Code developed runs smoothly, is highly rated by users, and helps the client generate a steady profit. The team was highly communicative, and internal stakeholders were particularly impressed with Managed Code's expertise.

(01)
Vitalii Drach
CEO, RD2

Their professionalism and commitment to delivering high-quality solutions made the collaboration highly successful.
Thanks to Managed Code's efforts, the AI assistant significantly improved the client's ability to serve new and existing clients, resulting in increased customer satisfaction and higher sales. The team was responsive, adaptable, and committed to excellence, ensuring a successful collaboration

(02)
Christopher Mecham
CTO, Legal Firm

We're impressed by their expertise and their client-focused work.
With an excellent workflow and transparent communication on Google Meet, email, and WhatsApp, Managed Code delivered just what the client wanted. They effortlessly focused on the client's needs by being client focused, as well.

(03)