Building an AI-Native Consulting Platform
The architecture decisions behind FlowState IQ — a Next.js + Supabase + AI pipeline that turns scattered consulting workflows into structured intelligence.
I built FlowState IQ because I watched a consulting methodology outgrow the tools available to support it.
The methodology is called forcing questions — a systematic questioning approach that assumes something specific should exist in an operation, then lets the gap reveal itself when it doesn't. One workshop. Twenty-eight operational gaps discovered. Twenty-eight places where people had been held accountable for processes that were never defined.
That workshop worked. But the methodology lived in spreadsheets, slide decks, and tribal knowledge. Scaling it meant building a platform that could hold the methodology's structure while adding intelligence on top. That's FlowState IQ.
The Problem
Consulting firms run on scattered infrastructure. Research lives in one place, engagement planning in another, workshop facilitation in a third, and analysis in a fourth. The connective tissue between these phases is the consultant's brain — and it doesn't scale.
I kept seeing the same failure modes:
- →Research findings that never make it into engagement planning
- →Workshop insights that get transcribed but never structured
- →Gap discoveries that live in PDFs nobody revisits
- →Methodologies that depend entirely on which consultant is in the room
The goal wasn't to replace the consultant. It was to build the infrastructure that makes the methodology portable — consistent regardless of who facilitates, with intelligence layered on top of every phase.
Architecture Overview
FlowState IQ is a Next.js 15 application backed by Supabase (Postgres + Auth + RLS) with an AI pipeline built on Anthropic's Claude. The stack is deliberately boring in the places where boring is a virtue, and opinionated where it matters.
Platform Layers
A few architecture decisions that paid off early:
Server Components by default. The entire app renders on the server unless a component needs client interactivity. This keeps the bundle small and means auth checks happen at the server layer, not in the browser. No loading spinners for data that should already be there.
Server Actions organized by table, not by page. Gap resolution logic lives in gaps.ts, not scattered across the pages that happen to touch gaps. When three different features needed to update gap status, there was one function to maintain.
Two Supabase clients. User-facing reads go through createClient() with full RLS enforcement. Admin writes use a service-role adminClient that bypasses RLS. This separation is load-bearing — it means I can write aggressive security policies without worrying about server-side operations getting blocked by them.
The Intelligence Modes
FlowState IQ isn't one AI feature bolted onto a dashboard. It's four distinct intelligence modes, each designed for a specific phase of the consulting engagement lifecycle. They share a common prompt architecture but produce fundamentally different outputs.
AI-driven research that builds a structured intelligence profile for a prospective client. Industry context, technology landscape, organizational signals, competitive positioning. The output feeds directly into engagement planning — not a slide deck, a structured data model that other modes consume.
Analyzes an organization's technology stack against their identified gaps. Classifies each gap into one of four categories: system can close, partial coverage, process problem, or new capability needed. The classification drives resolution recommendations — quick wins, configuration changes, training, or new acquisitions.
The core methodology engine. Guides a structured workshop through forcing questions organized in the WHAT→HOW→WHO sequence. Each question assumes something specific should exist. When it doesn't, the gap surfaces. The AI adapts the question sequence based on responses, classifies discovered gaps in real time, and generates readiness scores.
Analyzes existing process documents against five dimensions: ownership clarity, process specificity, edge case coverage, decision criteria, and completeness. Surfaces gaps that exist in documentation before a workshop even starts. Findings link directly to the gap lifecycle — same severity model, same resolution tracking.
The key insight: each mode produces structured data that other modes consume. RECON feeds MAP. FLOW feeds AUDIT. Gaps flow through a unified lifecycle regardless of which mode discovered them. The modes aren't features — they're a pipeline.
Database Design
Thirty-eight tables and twelve enums, all behind Supabase's Row-Level Security. Multi-tenant by design from day one, not retrofitted.
The tenancy model chains through organizations. A consultant owns an organization. Users belong to an organization. Every data table scopes through that chain. The RLS policies use a SECURITY DEFINER helper function that resolves the current user's organization at the database level — the application layer never needs to filter by tenant manually.
consultant (user_profiles.platform_role = 'consultant')
└─ organizations (consultant_id → user_profiles.id)
└─ user_profiles (organization_id → organizations.id)
└─ workshop_sessions, identified_gaps, map_sessions...
└─ All scoped via user_id → user_profiles.idA few design decisions I'd make again:
Gaps as first-class entities. The identified_gaps table is the connective tissue of the entire platform. Gaps flow through a five-phase lifecycle — Discovery, Classification, Resolution Planning, Execution, and Verification. Multiple modes can create gaps. Multiple modes can enrich them. The lifecycle state machine doesn't care which mode originated the gap.
Platform profiles for technology intelligence. Instead of letting each organization describe their tech stack as free text, I built a platform_profiles table with twenty-six seeded entries — canonical records for platforms like Salesforce, HubSpot, Jira, and the rest. When MAP mode classifies a gap, it references specific platform capabilities rather than guessing what “we use Salesforce” might mean.
AI Pipeline
Every AI interaction in FlowState IQ goes through a layered prompt architecture. The prompt builder assembles context in a specific order, and that order is intentional.
┌─────────────────────────────────────┐ │ Layer 1: System Identity │ ← Who the AI is, behavioral rules ├─────────────────────────────────────┤ │ Layer 2: Mode Instructions │ ← RECON / MAP / FLOW / AUDIT specific ├─────────────────────────────────────┤ │ Layer 3: Engagement Context │ ← Org data, tech stack, prior gaps ├─────────────────────────────────────┤ │ Layer 4: Session State │ ← Conversation history, current phase ├─────────────────────────────────────┤ │ Layer 5: User Input │ ← The actual message └─────────────────────────────────────┘
The layering matters because it creates consistent behavior with contextual variation. The system identity layer establishes the voice and constraints. The mode layer defines the specific task. The context layer grounds the AI in the organization's reality. The session state prevents the AI from asking questions that have already been answered.
All AI responses stream. The API routes use the Anthropic SDK's streaming interface and return chunks as they arrive. For workshop facilitation especially, this matters — a consultant watching a gap classification appear in real time is a fundamentally different experience than waiting for a full response to load.
I also discovered the hard way that token limits require defensive architecture. Long workshop sessions accumulate substantial context. The prompt builder now includes a context window that prioritizes recent conversation turns and summarizes older ones. Without this, sessions would silently degrade as they exceeded the model's context window.
What I Learned
Building an AI-native application is not the same as adding AI to an application. That distinction shaped almost every decision.
The hard part isn't getting the AI to generate useful output. The hard part is building the data model that makes AI output actionable. A beautifully worded gap analysis is worthless if it doesn't connect to a resolution workflow.
Structured outputs beat free text. Early versions let the AI generate narrative reports. They read well but couldn't be queried, filtered, tracked, or connected to anything. The shift to structured JSON outputs — with severity enums, category classifications, and typed recommendations — transformed the platform from a chatbot into an intelligence system.
RLS is worth the friction. Row-Level Security in Postgres is genuinely painful to set up correctly. Every table needs policies. Every new feature means new policy considerations. But the alternative — application-layer authorization scattered across server actions — is a security model that degrades as the codebase grows. I'd choose the upfront pain again.
The gap lifecycle is the product. I thought the AI-driven workshops were the product. They're not. The product is what happens after a gap is discovered — how it gets classified, planned, assigned, tracked, and verified. The five-phase gap lifecycle became the spine that every feature connects to. Getting that model right mattered more than any individual mode.
Build for the consultant, not around them. The first instinct when building AI tooling is to automate the human out of the loop. That's backwards for consulting. The best outcomes happen when a skilled facilitator has structured intelligence at their fingertips. FlowState IQ handles the pattern detection, classification, and data plumbing. The consultant handles the room.
What's Next
FlowState IQ is the foundation. The broader vision is Elevate IQ — a platform where the forcing question methodology extends beyond operational gap analysis into strategic planning, technology evaluation, and organizational design.
The architecture was designed for this. The mode system is extensible — adding a new intelligence mode means defining its prompt layers, its data model, and its UI, not rearchitecting the platform. The gap lifecycle is mode-agnostic. The tenancy model supports consultant networks, not just individual practitioners.
The honest assessment: this platform is early. The methodology is proven in workshops. The AI pipeline works. The architecture holds up. But the real test is whether the intelligence compounds over time — whether each engagement makes the platform smarter for the next one. That's the bet.
If you're building something similar — an AI-native application where the AI isn't the product but the infrastructure — I'd emphasize one thing: invest in the data model before the prompt engineering. The prompts will change. The model will change. The structured data that flows between your features is the architecture that actually compounds.
The questions are universal. The technology is the delivery mechanism. Get the questions right and the architecture will follow.
Read the origin story
How a single workshop with twenty-eight gap discoveries became a platform.