How We Are Building an AI That Reads Legal Texts So Humans Don't Have To

A look at the retrieval, grounding, and citation architecture behind Immigranta, an AI assistant for navigating UK immigration information.

We have all been there.

You Google something like, “Do I need a work visa to move to the UK?” and suddenly you are drowning in tabs: a government website that looks like it has not changed since 2009, a 40-page PDF called Final_Final_v3_ReallyFinal.pdf, and a blog post from 2014 written by a guy named Kevin who may or may not still be living there legally.

Two hours later, you have three conflicting answers and zero clarity.

What if, instead, you could just ask?

What if you could ask a question in plain English and get a clear answer drawn from the actual immigration rules, with the exact legal references and links to the official sources?

That is what we are building with Immigranta: an AI assistant that reads UK immigration information so humans do not have to spend their evenings trying to decode it.

And before we go any further: it is pronounced “immigrant-A”, with the final “A” sounding like “eh”.

The Vision

For the user, the experience should feel simple.

You open a chat and ask:

What visa do I need if I am a software engineer moving to the UK?

Behind that question is a lot of complexity, but the response should not feel complex. Immigranta should return:

A clear answer in plain English.
The relevant legal references.
Links to the official sources.
A short explanation of why the information applies to the question.

No answer based on a random blog. No unexplained conclusion. No pretending that the model knows something it cannot support.

The goal is sourced, explainable information.

Immigranta is not a replacement for a regulated immigration adviser or solicitor. It is an information tool designed to make the underlying rules and guidance easier to find and understand. That boundary matters, particularly when a person’s circumstances are unusual or the consequences of a decision are serious.

The product is not only useful for people trying to understand their own immigration options. Lawyers, paralegals, and legal administrators also spend significant time locating relevant rules, following cross-references, and turning dense legal text into something a client or colleague can understand. Immigranta can support that research and explanation process by surfacing the relevant passages and keeping the source material attached.

Retrieval-Augmented Generation in Action

Large language models are good at producing fluent answers. The problem is that fluency and correctness are not the same thing.

A model can produce a confident, convincing answer that is outdated, incomplete, or simply wrong. That is inconvenient in many applications. In law, it can be dangerous.

This is why Immigranta is built around Retrieval-Augmented Generation, commonly called RAG.

Instead of asking a language model to answer from its internal memory, the system first searches a curated knowledge base of official legal information. It selects the passages that are most relevant to the question and gives those passages to the model as evidence.

The model is then instructed to answer from that evidence, cite its claims, and say when the available material does not support an answer.

At a high level, the process looks like this:

Search official immigration rules and guidance.
Retrieve the most relevant passages.
Re-rank the results for relevance and authority.
Generate an answer grounded in the selected evidence.
Attach citations, source links, and retrieval rationales.

Every answer should lead back to something the user can inspect.

Why Legal Retrieval Is Difficult

Legal and immigration texts are not ordinary documents.

They are long, hierarchical, full of cross-references, and constantly changing. A rule may depend on a definition in one appendix, an exception in another section, and guidance published on a different page. A paragraph that looks relevant in isolation may mean something different when read with its sub-paragraphs.

This means that building the system is not as simple as splitting a PDF every 500 words, creating embeddings, and sending the nearest result to an LLM.

The architecture has to preserve enough structure for the retrieved evidence to remain legally meaningful. It also has to distinguish between a passage that mentions the same words as the question and one that actually answers it.

That is where most of the interesting engineering work begins.

1. Data Ingestion and Preparation

We start with official UK government sources.

The corpus includes material such as the Immigration Rules, relevant appendices, Statements of Changes, UKVI guidance, caseworker instructions, and visa and citizenship guidance.

We intentionally prioritise primary sources. Unofficial blogs, forum posts, and social media discussions may be useful for discovering common questions, but they are not treated as legal authority.

Once collected, the documents are cleaned and normalised. Navigation text, duplicated page elements, broken formatting, and other noise are removed before the content enters the retrieval system.

The documents are then divided into structured chunks.

Chunking matters more than it sounds. If a rule is separated from its exceptions or sub-paragraphs, the retrieval system may return evidence that is technically related but misleading when read alone. We therefore try to preserve the legal hierarchy: rules stay connected to their children, definitions retain their context, and references to appendices are recorded.

Each chunk is enriched with metadata such as:

The official source and URL.
The section and heading.
The immigration route or topic.
The publication or effective date, where available.
A short summary of the passage.
References to related sections or appendices.

This metadata gives the retrieval layer more than raw text to work with.

2. Embeddings and Knowledge Storage

The processed chunks are converted into vector embeddings.

An embedding is a numerical representation of meaning. Passages that discuss similar concepts should sit closer together in vector space, even when they do not use exactly the same words.

For example, a user may ask about the minimum pay for a work visa while the official document uses the phrase “salary requirement” or “going rate.” Semantic retrieval helps connect those expressions.

The vectors are stored in a vector database alongside the original text and its metadata. This allows the system to search by meaning while still filtering by attributes such as route, source, section, or date.

3. The Hybrid Retrieval Layer

Semantic similarity is useful, but it is not enough for legal search.

Legal questions often include exact terms, route names, paragraph numbers, or phrases whose wording matters. An embedding model may understand the general concept while missing the importance of a specific term.

Immigranta therefore uses hybrid retrieval.

For every question, the system can run:

Semantic search to find conceptually relevant passages.
Keyword or lexical search to find exact legal language, section numbers, and named routes.

The two result sets are fused and re-ranked. The re-ranking stage considers not only textual relevance but also factors such as source authority, document freshness, and whether the passage directly addresses the user’s question.

This gives us a better chance of retrieving both the obvious evidence and the important passage that semantic search alone might overlook.

4. Context-Aware Querying

People do not speak to a chat assistant as if every message is a new search form.

A conversation might look like this:

What is the Skilled Worker visa?

Then:

What is the salary requirement?

The second question makes sense only in the context of the first.

Before retrieval, the system rewrites follow-up questions into standalone search queries. In this example, it might produce:

What is the salary requirement for the UK Skilled Worker visa?

This rewritten query is used for retrieval while the original conversation is preserved for the final response.

The retrieved chunks then pass through a lightweight relevance check. A smaller model or a set of scoring heuristics evaluates whether each passage actually helps answer the question. The system also records a short rationale explaining why a piece of evidence was selected.

This extra step reduces the amount of irrelevant context passed to the final model. More context is not automatically better context.

5. Grounded Answer Generation

Once the evidence has been selected, it is passed to the language model with strict instructions.

The prompt is designed around rules such as:

Answer only using the supplied legal excerpts. Cite each factual claim with its source and section. If the excerpts do not contain enough information, say that the answer could not be established from the available sources.

The model then produces a response built from the retrieved material, with inline citations and a plain-English explanation.

These instructions do not make hallucination mathematically impossible. No responsible AI system should claim that. They do, however, constrain the model, make unsupported claims easier to detect, and give the user a direct path back to the evidence.

We can also validate the generated citations against the retrieved context before returning the response. A citation should refer to a source the system actually retrieved, not one the model invented.

6. Explainability and Traceability

Explainability is not something we want to add after the product is finished. It is part of the response format.

Each answer includes a sources panel where users can inspect:

The exact legal text used to produce the answer.
The section and document it came from.
Why the passage was selected.
A link to the official source.

This matters for two reasons.

First, users should not have to trust the AI blindly. They should be able to read the source and decide whether the explanation matches it.

Second, traceability makes the system easier to test. When an answer is weak, we can inspect the full chain: the query rewrite, retrieved candidates, ranking scores, selected evidence, prompt, citations, and final response.

That makes it possible to determine whether the problem came from ingestion, retrieval, ranking, missing source material, or generation.

Keeping the Knowledge Base Fresh

Immigration rules change.

Salary thresholds move. Eligible occupation codes are updated. Routes open, close, or acquire new requirements. A response that was correct six months ago may no longer be correct today.

One advantage of a retrieval-first architecture is that we can update the knowledge base without retraining the language model.

The ingestion pipeline can monitor official sources, detect changes, process new versions, and update the relevant chunks and metadata. Old versions can be retained for auditability while current rules are prioritised during retrieval.

Freshness is therefore a data and retrieval problem, not something we leave to the model’s training cutoff.

Why This Architecture?

Because in law, sounding intelligent is not enough.

A confident but unsupported answer can cause real harm. The system has to be designed around evidence rather than eloquence.

Our priorities are:

Verifiability. Important statements should be traceable to an official source.

Freshness. The knowledge base can be updated as rules and guidance change, without waiting for a new model.

Control. Retrieval and prompting constrain the information available to the model and make unsupported output easier to identify.

Transparency. Users can inspect the passages behind an answer instead of being asked to trust a black box.

Honest uncertainty. When the evidence is missing, conflicting, or insufficient, the product should say so.

What Comes Next

The first version is deliberately focused: help people find and understand official UK immigration information, with sources they can inspect.

The next step is to learn how people actually use it. We want to understand the questions applicants ask, where legal professionals and administrators spend the most time, which explanations are genuinely useful, and where the current experience still leaves uncertainty.

That usage will shape what we build next.

One potential feature is refusal analysis. A user could provide an immigration refusal letter, and the system could help identify the reasons given, explain the rules and guidance referenced, and organise the relevant source material for further review.

That would require careful privacy controls, strong document handling, and an even clearer boundary between explaining a decision and giving legal advice. It is not a feature we want to add merely because an LLM can summarise a letter. We want to understand the real workflow first and build only where the product can be useful, traceable, and responsible.

Why This Matters to Us

As software engineers, we do not believe AI should be a black box, especially when it is helping people navigate law and bureaucracy.

People deserve clarity, not another layer of confusion. They deserve to know where an answer came from, how current it is, and when they need professional advice.

Immigranta is one step toward making difficult public information more accessible. The interface may be a simple chat, but the engineering underneath it is built around a more serious principle:

AI should not merely give answers. It should show its work.

Immigranta is live now.