Strategy Document · v2.1 · Updated 9 Jun 2025 Companion to Tech Stack & Architecture

NUSX — AI-Native Web

Three pillars,
one foundation.

Making content discoverable by AI agents, freeing editors from repetitive work, and delivering personalised pages and fast on-site search — all from one content model, built to absorb whatever the AI decade brings.

§ 01

Why AI, Why Now

Cloudflare agent-readiness score (out of 100)
April 2026

35K

Monthly visitors
With no way to search the site

On-site search that works
Current "search" redirects visitors to Google

The current NUS Enterprise site was built for a world where people found content through browsers and search engines. That world is shifting. Increasingly, people encounter NUS Enterprise content through AI agents acting on their behalf — ChatGPT, Google AI Overview, Perplexity, and enterprise agent platforms. These systems don't browse pages; they parse structured signals, consume clean text representations, and extract question-answer pairs.

A Cloudflare agent-readiness scan in April 2026 scored the current site at 25/100. The site has a valid sitemap but no support for markdown content negotiation, AI bot rules, or emerging agent discovery standards. Meanwhile, the GA4 data tells a clear story: international pages see bounce rates above 85% as partners and investors leave without finding relevant pathways, and 19K views per year hit 404 dead ends.

The redesign solves both problems at once. The same structured content model that powers AI discoverability also enables personalised pages, fast on-site search, and AI-assisted editorial workflows. Three outcomes from a single foundation.

This is not just about the current site. The content model, enrichment pipeline, and search infrastructure are designed so that NUSX is well-positioned for the AI decade. As new AI capabilities, agent standards, and inference models emerge, the website can adopt them without rearchitecting — the hard work of structuring content and building a clean foundation happens once, now.

§ 02

Three Pillars

AI integration is organised into three pillars. Each addresses a distinct audience and problem, but all three share the same content model and the same inference layer (Google Gemini as the preferred LLM provider, with the optionality to use any model as the landscape evolves).

PILLAR 01

AI Discoverability

Making NUSX content findable and understandable by AI agents — ChatGPT, Google AI, Perplexity, and the next wave of agent platforms. Content negotiation, structured data, llms.txt, AEO markup.

PILLAR 02

Content Intelligence

AI-assisted enrichment inside the CMS workflow. Auto-generated metadata, SEO titles, meta descriptions, question-answer pairs, and rewriting assistance. Editors paste approved content; the system handles the rest.

PILLAR 03

AI Features

User-facing features powered by the same content foundation. The flagship is personalised pages — both dynamic (assembled from content blocks around a visitor's intent) and pre-built archetype pages for the four most common visitor segments. Alongside it, fast two-tier search (keyword + semantic), a conversational assistant that answers questions from site content, and a codified design system that editors and creators can draw from.

Self-contained by design. All AI features operate entirely within the website's own infrastructure. The content model, the enrichment pipeline, the search indexes, and the personalisation engine all draw from the same source — the content editors publish in WordPress. No external content dependencies. No third-party data feeds. No separate knowledge base that needs its own editorial process. Every AI feature works with what the site already has, and can be extended later if the situation allows — but nothing requires it.

§ 03

Pillar 1 — AI Discoverability

When an industry partner asks Google AI "How do I licence NUS research IP?", when a prospective investor asks ChatGPT "What deep-tech incubation programmes does NUS Enterprise offer?", or when a prospective student asks "Can I do a startup internship at NUS as a Year 3 engineering student?" — the answer depends entirely on whether the source content is machine-readable. AI agents don't navigate menus. They consume structured signals.

This pillar ensures that every page on the redesigned site is as readable by an AI agent as it is by a human visitor — same URLs, same content, different representation.

3.1 Content negotiation — Markdown for AI agents

All pages serve clean markdown when an Accept: text/markdown header is present. Browsers get HTML with full design; AI crawlers get markdown with prose, headings, lists, and tables only. Same URLs, same sitemap, no separate subdomain. Implemented as Astro middleware or an edge function at the CDN layer.

This is the single highest-impact action for AI discoverability. A markdown representation gives agents exactly what they need — structured text without navigation chrome, footers, or JavaScript payloads.

3.2 llms.txt — Discovery and reference file

A /llms.txt file at the site root: a plain-text, machine-readable summary of NUSX, its programmes, departments, and site structure. No build step, no CMS dependency. Updated when major structural changes happen. This is the emerging standard that AI agents check first when encountering a new domain.

Beyond AI agent discovery, llms.txt serves as a reference that editors and other systems can consume. Content editors working in any tool — whether it's a CMS, a note-taking app, or an AI-assisted writing environment — can point to llms.txt to understand the site's structure, terminology, and content boundaries. It doubles as a lightweight onboarding resource for anyone working with NUSX content.

3.3 Structured data — Schema.org JSON-LD

Every page type emits typed Schema.org markup at build time, generated from WordPress fields:

Page Type	Schema.org Type	What It Captures
Programme pages	`Course`	Eligibility, duration, outcomes, credit load, locations
People profiles	`Person`	Name, role, department, research areas
Events	`Event`	Date, location, organisers, registration link
Organisation	`EducationalOrganization`	Structure, departments, programmes, contact points
Research	`ResearchProject`	Focus area, lead researchers, industry partners
FAQ pages	`FAQPage`	Question-answer pairs, canonical answers

Entity naming is consistent across markup and visible text. Build-time linting catches mismatches so AI systems don't split the institution into two entities — one from the JSON-LD and one from the page content.

3.4 Answer Engine Optimisation — AEO markup

Each content page is augmented with canonical question-answer pairs generated at publish time. Pages output the question as a heading and the answer as the first paragraph — the structure answer engines extract most reliably. This is covered in detail under Content Intelligence (§04) since the enrichment pipeline generates these pairs automatically.

3.5 AI agent instruction tag

Key pages include a <meta name="ai-agent-instruction"> tag with a concise machine-readable summary of the page content. This is a lightweight signal that helps agents understand page purpose without parsing the entire document.

3.6 What this delivers

Agent-ready from day one. The Cloudflare agent-readiness score moves from 25/100 to a projected 80+ by launch.
Same URLs, dual representation. No separate API, no separate subdomain, no duplicate content. One canonical source serves both humans and machines.
Future-proof. As agent standards evolve (MCP, A2A, whatever comes next), the content negotiation layer is a single middleware update — no CMS changes needed.
Measurable. Agent-readiness scores, structured data coverage, and AI referral traffic can be tracked from launch.

§ 04

Pillar 2 — Content Intelligence

The redesigned site does not replace the editorial workflow. Content creation stays with the marketing team using their existing tools — Google Docs, Monday.com, NotebookLM. WordPress handles publication only.

What changes is what happens between paste and publish. Today, editors spend significant time on mandatory metadata work: writing SEO titles, crafting meta descriptions, generating Open Graph tags, building JSON-LD markup, and structuring FAQ entries. This is repetitive, error-prone, and adds no creative value.

The enrichment pipeline does for metadata what spellcheck does for typos — handles the mechanical work so editors can focus on the content that matters.

4.1 How it works

When an editor publishes or updates a page in WordPress, a lightweight enrichment service reads the content, sends it to an LLM (Google Gemini as the preferred inference layer, with optionality to use any model) with type-specific prompts, and writes the generated metadata back to the CMS — all within seconds. No editor intervention required.

4.2 What gets generated

Output	Description	Who Uses It
Meta description	Concise page summary for search results and social sharing	Search engines, social platforms, AI agents
Open Graph tags	Title, description, and image metadata for social previews	Social platforms (LinkedIn, Facebook, X)
SEO title	Optimised title tag with primary keyword and institution context	Search engines
JSON-LD structured data	Typed Schema.org markup per page type (Course, Person, Event, etc.)	Search engines, AI agents
Canonical questions	3–5 questions this page answers, formatted as headings	AI answer engines, on-site search
Short direct answers	1–3 sentence direct answers per canonical question	AI answer engines (Google AI Overview, Perplexity)
Related concepts	Linked topics and entities mentioned on the page	On-site search, internal linking
Alt text suggestions	Descriptive alt text for images lacking accessibility metadata	Screen readers, search engines, AI agents

4.3 In-editor assistance

Beyond auto-generation on publish, the CMS includes lightweight AI assistance inside the editing experience. These are optional, non-intrusive features that editors can accept, modify, or ignore:

Meta description rewrite. A sidebar button that rewrites the current meta description for clarity, length, or keyword relevance.
SEO title suggestions. Alternative title tags based on the page content, ranked by likely search performance.
FAQ generation. Given a page's content, suggest question-answer pairs that the page could answer. Editor reviews and publishes the ones that make sense.
Readability check. Flag content that may be too dense for answer-engine extraction and suggest simplification.

4.4 Build-time validation

The Astro build pipeline enforces content quality rules so that incomplete metadata never reaches production:

AEO pages must have a direct answer under three sentences.
Programme pages must have typed metadata fields (eligibility, duration, outcomes).
Entity names must be consistent between JSON-LD and visible text.
Canonical questions must map to actual page headings.

4.5 What this does NOT do

Scope boundary. The enrichment pipeline does not generate editorial content, draft pages, or write marketing copy. Content creation remains the marketing team's responsibility using their existing tools (Google Docs, NotebookLM, Monday.com). The pipeline enriches, augments, and structures content that editors have already written and approved. It is a publication-time assistant, not a content creation tool.

4.6 What this delivers

Editor time savings. Mandatory metadata work — meta descriptions, OG tags, JSON-LD, AEO pairs — goes from manual to automatic. Editors paste approved content and publish.
Consistency at scale. Every page gets the same quality of metadata regardless of which editor published it. No gaps, no forgotten meta descriptions, no empty JSON-LD blocks.
Discoverability compound effect. Canonical questions and AEO markup generated here feed directly into Pillar 1 (AI Discoverability). Better metadata means better answers in ChatGPT, Google AI, and Perplexity.
Zero maintenance. Enrichment happens automatically on publish. No editor training, no workflow changes, no additional tools to learn.

§ 05

Pillar 3 — AI Features

The first two pillars work behind the scenes — AI agents find content, and editors publish faster. The third pillar brings the payoff directly to the visitor through four capabilities that share the same content foundation: the flagship personalised pages, fast two-tier search, a conversational assistant, and a codified design system.

5.1 Semantic search

The current site has no search function. Visitors navigate a flat hierarchy of 7,337 URLs and hope the right page is linked from the homepage. The GA4 data shows the cost: 19K views per year hit 404 pages and international pages see 85%+ bounce rates as partners and investors leave without finding what they need.

Semantic search replaces navigation-by-hope with fast, concept-aware retrieval. A visitor types a query in their own words and gets relevant results drawn from across all site content, ranked by meaning rather than keywords — covering every page and content block on the site.

Example queries

"How do I partner with NUS Enterprise on AI research?" → Industry partnerships + research expertise + contact
"Startup funding for deep-tech founders" → TTI funding + incubation + grants
"Commercialise university IP" → TTI disclosure + licensing + agreements
"Can year 3 engineering students do an overseas startup internship?" → NOC eligibility + curriculum + locations
"NOC Munich semester duration" → Curriculum overview + Munich location card

How it works

Search uses a two-tier client-side approach for resilience. Pagefind provides instant keyword results from a static full-text index — zero download, works on first page load. Transformers.js layers semantic understanding on top: at build time, a Python script generates embeddings for all content blocks; at runtime, the visitor's query is embedded using the 23MB model and ranked by cosine similarity against the static index. Visitors get results immediately from Pagefind; those results get better once the semantic model loads.

Two-tier client-side search. Pagefind delivers instant keyword matches while the semantic model downloads (~23MB, once per session). Once ready, Transformers.js re-ranks results by meaning — queries like "commercialise university IP" find industry partnerships pages even when the exact words don't match. No search server, no API calls, no vector database. If the model fails to load, Pagefind still works.

Capability	Description
Instant results	Pagefind keyword index returns results on first page load — no model download required
Semantic understanding	Transformers.js matches by meaning, not keywords — "commercialise IP" finds licensing pages even without the word "commercialise"
Resilient fallback	If the model fails to load, Pagefind keyword results still work — search never breaks
Block-level results	Search returns specific content sections, not just pages. A query about funding returns the funding block, not the entire 3,000-word parent page
Source attribution	Every result links back to its source page and section
No-result handling	When no confident match exists, visitors see browse links and contact options — not an empty page

Future improvements

The search setup is designed for progressive enhancement. As content and traffic grow, optional refinements can be layered in: a synonym / redirect dictionary for institutional jargon, scoped filters by audience or topic, and analytics-informed ranking tweaks. Each is optional and stays within the static-index model — the search works well without any of them, and none adds a runtime backend.

5.2 Personalised pages — flagship

Personalised pages are the flagship of this pillar: each visitor gets their own corner of the website, assembled from the same content blocks based on what they need. It is not a separate section — it is the same NUSX content, reorganised around the visitor's intent using a templated approach and the same embeddings index to select and arrange the ideal blocks for that person. Composition is deterministic slotting of real, published blocks — no AI generation in the request path, nothing to fact-check.

The flow

Visitor describes their need A short natural-language prompt: "I'm a deep-tech founder looking for funding and workspace" or "Industry partner interested in AI research collaboration."
System retrieves relevant content blocks The same semantic retrieval that powers search finds the blocks that match the visitor's intent, audience, and journey stage.
Page is assembled from blocks Blocks are arranged into a template — hero, overview, key facts, main content, social proof, FAQ, call to action. The page feels intentionally assembled, not like raw search results.
Visitor saves or shares The assembled page can be saved for later reference or shared with a colleague via a stable link. This creates an easy onboarding path — no friction, just added value.
Visitor becomes a lead The save and share mechanism motivates visitors to leave an email, converting a high-intent visitor into a qualified lead for NUSX.

Why this works

The redesigned site is built on a block-based content model. Pages are not monolithic documents — they are ordered collections of content blocks, each with structured metadata (audience, topic, intent, journey stage). This makes assembly-based features architecturally straightforward: a personalised page is simply a different ordering and selection of the same blocks.

High-value lead capture. The personalised page doubles as an intake funnel for high-value visitors — partners, investors, and serious prospects who arrive with clear intent. Instead of navigating a static hierarchy, they describe what they need and receive a tailored experience. The sharing mechanism is particularly powerful: a partner can create a personalised page and send it to a colleague, spreading NUSX's reach through the visitor's own network. Email capture for saving pages converts intent into a qualified lead.

Archetype pages

Not every visitor needs the dynamic assembly experience. The four most common visitor segments get pre-built pages generated at build time as static HTML — no model download, no client-side assembly, instant load. Editors curate the block selection; the build produces the pages.

/for/founders — Startup founders: funding, workspace, incubation
/for/partners — Industry partners: research collaboration, licensing
/for/researchers — Faculty: commercialise IP, spin-out pathways
/for/students — Students: NOC, overseas internships, entrepreneurship

These are fully crawlable and indexable (unlike dynamic personalised pages which are noindex). They serve as SEO entry points for the most valuable audience segments and link to the dynamic page builder for visitors whose needs fall outside the four archetypes. The archetypes handle the top of the funnel; dynamic assembly handles the long tail.

5.3 Conversational assistant

A chat widget that answers visitor questions grounded in the site's own content. Not a generic chatbot that draws from its training data — this is RAG over the same content blocks that power search and personalised pages. Every answer cites its sources with links to the real pages. The chatbot also doubles as an intake funnel: as the visitor describes their needs, the system silently assembles a personalised page and offers it with a single click.

How it works

Retrieve. The visitor's message is embedded and matched against the same embeddings index used by search. The top-matching content blocks become the LLM's context.
Generate. Blocks + message are sent to Gemini via the existing Sidecar service. The LLM generates a concise, grounded answer that references specific content blocks.
Stream. The response streams to the browser for low-latency perceived response time.
Cite. Every answer includes source attributions — which blocks and pages it drew from. Visitors can verify and navigate to the full content.
Assemble. After a few exchanges, the Sidecar has enough intent to run the same block-retrieval and template-slotting logic used by personalised pages. The widget surfaces a prompt — "We built a page for you based on our conversation" — with a button that opens the assembled page. A chat interaction becomes a shareable, persistent resource.

No new infrastructure. The chatbot is a new route in the existing Sidecar service, using the existing Gemini API and the existing embeddings index. The page-assembly feature reuses the exact same retrieval + slotting code as the standalone personalised page builder. No vector database, no separate chatbot platform, no third-party service.

Step	Component	What happens
1. Message	Browser	Visitor types a question in the chat widget
2. Retrieve	Sidecar	Message embedded server-side → cosine match against embeddings → top 8 blocks selected
3. Generate	Sidecar → Gemini	Blocks passed as context → grounded answer with citations
4. Stream	Sidecar → Browser	Response streamed to the chat widget in real time
5. Assemble	Sidecar	Once enough intent gathered, same retrieval + template assembly runs → personalised page URL returned
6. Prompt	Browser	Widget shows "We built a page for you" button → opens assembled page
7. Fallback	Browser	If Sidecar unavailable, widget shows search results instead

5.4 design.md — Codified brand and identity

Google's open design.md specification — supported in their Stitch tool and Antigravity platform — codifies brand identity and design principles in a structured, machine-readable format. It combines YAML design tokens (colors, typography scale, spacing, component schemas) with markdown prose that explains the rationale behind each decision — not just what the brand looks like, but why.

Because the format is structured and lintable, it can be validated for consistency (WCAG contrast ratios, token completeness), exported directly to a Tailwind config, and consumed by AI tools to generate on-brand content. Editors, designers, and AI assistants all reference the same source of truth — whether writing a page in the CMS, drafting a partnership brief in Google Docs, or generating content that aligns with the brand.

Like llms.txt for site structure, design.md gives the brand a single, version-controlled source of truth that humans and tools can consume. It lives alongside the content and evolves with the brand.

5.5 What this delivers

Search where there was none. The current site has no search. Two-tier search (instant keyword + semantic) gives 35K monthly visitors a way to find what they need without navigating menus.
Intent-driven, not page-driven. Visitors describe what they need in their own words and get relevant content — regardless of which department "owns" the page.
Lead capture for high-value visitors. Personalised pages surface the right content for partners, investors, and serious prospects — and the save/share/email mechanism converts intent into leads.
Conversational answers, grounded in real content. The chat widget answers questions using only site content, with source citations on every response. Not a hallucination-prone chatbot — RAG over curated content blocks.
Same foundation as Pillars 1 and 2. Search and page assembly use the same content blocks, the same enrichment metadata, and the same build-time indexes (Pagefind + embeddings). No separate infrastructure.
Brand consistency at scale. A codified design system ensures that every piece of content — whether created by editors, partners, or AI-assisted tools — aligns with NUSX's identity.

§ 06

Architecture

All three pillars share one architecture. Content flows from WordPress through a lightweight enrichment layer to three surfaces: AI agents (Pillar 1), enriched metadata (Pillar 2), and user-facing features (Pillar 3).

6.1 Self-containment principle

Every AI feature in this strategy is contained within the website's own stack. The system ingests content that editors publish in WordPress, enriches it through an enrichment service, indexes it in a build-time search index, and serves it through the Astro frontend. There are no external content dependencies, no third-party data feeds, and no separate knowledge base that requires its own editorial process or maintenance budget.

This is an architectural guarantee, not just a design preference. It means:

No additional content work. AI features consume what editors already publish. No parallel content pipeline, no separate tagging workflow, no duplicate effort.
No external dependencies at launch. Everything the AI features need — content, metadata, search index — lives within the website's own infrastructure. The site is fully functional without any external data source.
Extendable by design. If the future situation allows — for example, integrating NUS research databases, external event feeds, or a dedicated knowledge base — the architecture makes it straightforward to add new content sources without rearchitecting. But nothing requires it.
Predictable maintenance. One content model, one enrichment pipeline, search indexes built at deploy time. The team maintains the website, not a growing constellation of AI-adjacent systems.

6.2 Data flow

Editor publishes in WordPress Content is written and approved in existing tools (Monday.com, Google Docs), then pasted into WordPress for publication.
Enrichment service processes content On publish, WordPress triggers the enrichment service, which calls an LLM to generate metadata and writes it back to the CMS.
Astro builds and generates search artifacts Static pages are generated with enriched metadata, JSON-LD, and content-negotiation support. Build generates both search indexes: Pagefind crawls rendered pages for the keyword index, and a Python script generates embeddings for the semantic index. Four archetype pages are produced as static HTML. All artifacts are served from the Cloudflare CDN edge.
AI agents consume content Crawlers requesting Accept: text/markdown receive clean markdown from the same URLs. Structured data and llms.txt provide additional discovery signals.
Visitors search, chat, and receive personalised pages Pagefind delivers instant keyword results. Once the semantic model loads, Transformers.js re-ranks by meaning. Personalised pages assemble from the same indexed content blocks. Archetype pages load instantly for the four main visitor segments. The chat widget answers questions using RAG over the same embeddings + Gemini.

6.3 Component responsibilities

Component	Responsibility
WordPress (headless)	Content authoring and publication. Custom post types with ACF metadata fields. Webhook on publish. SSO integration with NUS academic identity provider.
Enrichment service	Lightweight service that calls the preferred LLM for metadata generation, question-answer extraction, and rewriting assistance. Prompt management per content type.
LLM inference layer	Google Gemini as the preferred provider, with optionality to use any model. Handles metadata generation, content enrichment, and chat responses — not search, which uses client-side indexes (Pagefind + Transformers.js).
Astro (frontend)	Static site generation with enriched metadata. Build-time Pagefind index + embeddings generation. Four archetype pages. Content negotiation middleware. Search UI component. Personalised page assembly.
Cloudflare CDN	Edge serving in Singapore. Branch previews. Markdown content negotiation at the edge.
SSO / Identity	Integration with NUS's existing academic SSO provider for WordPress admin access and future authenticated features.

6.4 Build vs. buy

Consistent with NUSX's preference for proven, maintained solutions over custom builds:

Off-the-shelf: WordPress, Astro, Google Gemini API, Cloudflare CDN, Tailwind CSS, Transformers.js (client-side semantic search), Pagefind (keyword search fallback). SSO via NUS's existing academic identity provider.
Custom build: Enrichment service, search UI component, personalised page assembler, archetype page generator, chat widget + Sidecar chat endpoint, content negotiation middleware, AEO validation rules.

The custom work is concentrated in the enrichment service and frontend components — thin layers that connect proven services. No custom ML model training, no custom vector database, no custom inference infrastructure.

6.5 Single sign-on (SSO)

WordPress admin access integrates with NUSX's existing academic SSO provider. This allows editors and content managers to authenticate using their institutional credentials rather than separate WordPress accounts. The SSO integration uses standard WordPress authentication hooks and can be extended to support authenticated visitor features in the future (personalised dashboards, saved pages, partner portals) without rearchitecting the identity layer.

Guidance, not prescription. This document describes a strategic approach to building an AI-native web presence. The scope, priorities, and specific implementations are open to modelling, improving, or reducing as the project progresses. The three pillars provide a framework for thinking about AI integration — not a fixed blueprint. The value is in the content model and the architecture: once the foundation is in place, individual features can be adjusted, deferred, or expanded based on what makes sense for NUSX at any given time.