CASEINSIGNIA / DOSSIER № 0001№ 0001FILE 2026.05.OVERVIEWCH.06 · WORK

PG. SUB · abac-rag

SHIPPEDCASE / 2025.07

/Platform-Org/

Need to Know

ABAC-gated RAG for an internal knowledge fabric

PLATFORM · DEVOPS

/ 01

Stakes

A platform-engineering org had a knowledge problem hiding inside an access problem. Runbooks, postmortems, internal API specs, contracts, board materials, comp bands, six functions, four sensitivity tiers, one sprawling corpus. They tried two SaaS RAG tools; procurement killed both because the corpus couldn't leave their cloud. Engineers were doing 6-window grep across Confluence, GitHub, and the VDR every time they needed to remember why a service was decommissioned in 2023. Studio built bespoke: self-hosted hybrid retrieval (dense plus BM25 plus reranker), every chunk tier-tagged at ingest, every query carrying the asker's tier and attributes, every answer carrying citations the asker is authorized to see. Audit log is immutable (WORM). Across 14 months: zero RBAC leaks across audit findings, 87% query satisfaction, 40 hours per quarter of compliance review time recovered.

/ 02

Approach

Ingest-time tier tagging on every chunk

Tiers are not assigned at query time, they are baked into the chunk record the moment it enters the index. Apache Tika and Unstructured.io normalize across PDFs, Confluence pages, GitHub markdown, VDR exports; Presidio scans for PII and applies redactions with named-entity recovery so phrasing stays natural; the tier tag (T1 to T4) is written alongside the embedding. The tag travels with the chunk for its lifetime in the store.

Hybrid retrieval with ACL filter at gate 1

Every query carries the asker's JWT-bound tier and attribute claims through the Kong gateway. Before either retriever sees a vector, the ACL filter narrows the candidate set to chunks the asker is allowed to consider. Dense retrieval (Qdrant) and lexical retrieval (OpenSearch BM25) both fire against the filtered set; the cross-encoder reranker (BGE) scores the merged candidates against the original query.

Post-retrieval validation at gate 2

JWT tier is a 15-minute snapshot. ACLs can change inside that window, a revoke, a role-down, a project rotation. Gate 2 re-fetches the asker's current ACLs (Postgres source of truth, 60-second Redis cache) and drops any candidate that no longer passes. The JWT TTL is the safety net; the fresh lookup is the source of truth.

Post-LLM citation validation at gate 3

Once the LLM (Claude Sonnet 4.6, with a model-portable runtime and a Groq fallback path) produces an answer, gate 3 walks every emitted citation and confirms it sits in the asker's allow-list. If the model hallucinates a citation outside tier or fabricates a reference, the response is withheld and a refusal is returned with the trace id. Three independent gates, no shared trust.

Immutable WORM audit ledger

Every request, allowed or refused, writes a signed entry to S3 Object Lock in compliance mode with a 7-year retention. Hash-chained for tamper evidence. In compliance mode even the AWS root account cannot delete an object during retention. Auditors export directly; the GRC team stopped reconstructing trails by hand.

/ 03

Architecture

Three lanes. The ingest lane (top) parses, scans for PII, tags tier, embeds, writes to both stores. The query lane (middle) runs the request through Kong, JWT validation, fresh ACL resolve, hybrid retrieval, reranker, LLM, and guardrails. The audit lane (bottom) accepts a write from every stage that touches the request. Three padlocks mark the three-stage ACL gating: pre-retrieval filter, post-retrieval validation, post-LLM citation validation. The gates are the load-bearing idea; everything else is plumbing.

/ 04

Challenges

Challenge · 01

Stale-permission window: ACLs change inside a 15-minute JWT TTL, a revoke or role-down can land mid-session.

Resolution

Fresh ACL lookup on every request against the Postgres source of truth, fronted by a 60-second Redis cache to keep latency budget. JWT TTL stays as the failure-mode fallback. The fresh lookup is the gate; the JWT is the floor.

Challenge · 02

Source-system identity reconciliation: Confluence groups, GitHub teams, VDR rooms, and HRIS roles all model permissions differently.

Resolution

Ingest-time mapping to a single tier model with provenance kept per source. Conflicts are surfaced for human review at ingest, never at query, so the runtime gate sees a clean tag. Re-ingest reruns the mapping when source ACLs shift.

Challenge · 03

PII inside legitimately-accessible docs (a runbook that pastes a customer ID, an incident transcript with employee names).

Resolution

Ingest-time Presidio scan with redaction and named-entity recovery so the surrounding phrasing stays natural, no jagged [REDACTED] in the middle of a sentence. Redaction state travels on the chunk and renders as a badge on the citation.

Challenge · 04

Auditor trust required immutability that even root could not break, soft-delete and IAM policies were not enough.

Resolution

S3 Object Lock in compliance mode with 7-year retention. In compliance mode (as distinct from governance mode) no principal, including the root account, can delete or shorten retention during the lock period. The auditor signed off on this in week one.

Challenge · 05

Refusal correctness vs over-refusal: a model that refuses everything is technically correct and operationally useless.

Resolution

Budget of 5% or fewer over-refusals, tracked weekly. Refusals are sampled by the GRC team and replayed; tier mappings or prompt-side scoping is adjusted. None missed in audit across 14 months; the over-refusal rate has stayed under budget for 11 of those months.

/ 05

Tech stack

Auth + identity

·OIDC via Okta (Azure AD path supported)
·JWT RS256 with 15-min TTL + rotating refresh
·Attribute claims: tier, function, geo, on-call

Retrieval

·Qdrant, self-hosted (dense vectors)
·OpenSearch, self-hosted (BM25 lexical)
·BGE-large embedder + BGE cross-encoder reranker

LLM

·Claude Sonnet 4.6 (primary)
·Model-portable runtime (swap without rewrite)
·Groq fallback path for cost + capacity

Ingest

·Apache Tika + Unstructured.io (multi-format)
·Microsoft Presidio (PII scan + redaction)
·Tier tag stamped per chunk at write

Audit

·S3 Object Lock, compliance mode (WORM)
·Hash-chained ledger, 7-year retention
·Direct auditor export, no intermediate format

Gateway + ops

·Kong (gateway, rate, JWT validate)
·FastAPI orchestrator, OpenTelemetry traces
·Single-tenant deploy, customer-controlled cloud

/ 06

Try it

Persona-Gated RAG

Four personas across the four sensitivity tiers, five sample queries against the corpus. Pick a persona, pick a query, submit. The response panel renders the answer the system would have produced for that asker, with the citations they are authorized to see, OR a refusal stamped with the tier requirement that blocked it. The audit-log block shows what was written to the WORM ledger for the request. The same query against different personas swings between authorized and refused, which is the entire point.

1 · choose persona

2 · pick a query

corpus · postmortems · runbooks

Asker

platform engineer T1

“What was the 2024-08 outage RCA?”

3.8s

AUTHORIZED · T1 attributes match

Cascade failure on 2024-08-14, the public summary attributes it to a config rollout in the edge proxy that bypassed the staged-rollout gate. Mitigation was a full rollback in 22 minutes. The exec summary you can see does not include customer names or the redacted incident-channel transcript.

Citations

T1POST-2024-0814-execOutage 2024-08-14 · executive summary
T1RB-edge-rollout-v3Edge config rollout runbook v3

Operator noteRBAC binds permissions to roles. ABAC binds them to attributes the asker carries at request time: tier, function, geo, on-call status, contract scope. Production ACLs are evaluated fresh against Postgres with a 60s Redis cache; the JWT TTL is the safety net, not the source of truth. Tier tagging happens at ingest, not at query, so a chunk's sensitivity travels with it; the ACL filter at gate 1 narrows the candidate set before either retriever sees a vector, gate 2 revalidates after retrieval against fresh ACLs (catches the stale-permission window between revocation and JWT expiry), and gate 3 confirms every citation the model emits is one the asker is authorized to see. Three checks, independent failures, zero shared trust.

/ 07

Outcomes

RBAC leaks · 14 months

P95 query latency

4.2s

Query satisfaction

87%

Review time recovered / qtr

40h

→ Zero RBAC leaks across 14 months of audit findings
→ P95 query latency 4.2s end-to-end including three ACL gates
→ 87% query satisfaction (4-or-5-star) across six stakeholder functions
→ Compliance review time recovered, ~40 hours per quarter

Footprint

Deployed on [REDACTED] platform-engineering org's customer-controlled cloud, single tenant. Six stakeholder functions onboarded across the first three quarters; new functions ship through the same ingest pipeline. SOC 2 Type II audited.

Evidence

Deployed on customer-controlled cloud (single tenant). SOC 2 Type II audited. Sovereign deployment pattern available for regulated tenants. Audit ledger has been exported directly by external auditors across two full cycles.

CH.06 · WORK · REGISTER · CASE FILEIN.↓ TURN PAGE, CH.06 · ALL CASES↓ CH.06