Definition
LLM governance extends traditional ML governance with unique challenges: prompt injection, RAG provenance, generated text attribution, and logging prompts with PII—requiring new audit patterns.
PROMPT AND OUTPUT LOGGING
Raw prompts may contain sensitive data. Option 1: Log templates and parameter hashes, rehydrate from vault on demand. Option 2: PII detection and redaction before logging. At 5K QPS with 2KB prompts, expect 864GB/day—use tiered storage.
SAFETY CLASSIFIERS
Input classifiers: Check prompts for policy violations. Output classifiers: Scan completions before returning. For high-risk domains, require human review. Log classifier decisions (version, score, threshold) for audit.
💡 Insight: Before deployment, run red team suites: prompt injection, jailbreaks, toxic outputs, factual consistency. Store results in model card as release gate.
RAG PROVENANCE
Log retrieved chunk provenance: document ID, offset, score. Enables content takedown—if source is later inappropriate, lineage identifies all outputs that referenced it.
ONLINE LEARNING CONTROLS
Fine-tuning on feedback risks poisoning. Use rate limits (max 1% change/day), canary buffers for review, staging before production. Maintain immutable update records.
⚠️ Trade-off: Full prompt logging aids debugging but raises privacy risks. Selective redaction balances auditability and protection.
✓Large Language Model (LLM) governance logs prompt templates with parameter hashes (not raw Personally Identifiable Information or PII), retrieved context provenance (document ID, chunk offset, score), safety classifier decisions (score, threshold, pass or block), and final outputs to enable auditability without General Data Protection Regulation (GDPR) violations
✓At 5,000 Queries Per Second (QPS) with 2 kilobyte average prompt plus completion, systems generate 864 gigabytes per day, requiring same tiered storage (30 days hot, 7 years cold) as traditional Machine Learning (ML) prediction journals
✓Safety classifiers act as input and output guardrails, log every decision (prompt_id, classifier_v2.1, toxicity_score=0.12, threshold=0.15, decision=allow), A/B test classifier updates to ensure false positive rates do not increase more than 10 percent and degrade user experience
✓Red team test suites run before deployment covering prompt injection, jailbreaks, toxic generation, factual consistency, store results in model card as evidence of safety evaluation, Microsoft and OpenAI require this as a release gate
✓Retrieval Augmented Generation (RAG) provenance logging enables content takedown, if source document deemed inappropriate after deployment, lineage query identifies all outputs that referenced it for notification or retraction
✓Online learning from user feedback requires poisoning defenses: update rate limits (maximum 1 percent parameter change per day), canary buffers with manual review, staging environments for testing updates, immutable log of feedback deltas for audit and rollback
1Healthcare chatbot logs {"prompt_template": "symptom_query", "params_hash": "hmac:abc123", "retrieved_docs": ["mayo_clinic_id:4521", "webmd_id:9912"], "safety_input": "pass", "safety_output": {"medical_advice_score": 0.85, "threshold": 0.9, "decision": "allow"}, "completion_hash": "sha256:def456"}, redacts patient name from prompt before logging
2Microsoft Responsible AI review for customer service Large Language Model (LLM) includes red team testing with 5,000 adversarial prompts (jailbreaks, bias probes), documents results showing 99.2 percent safety classifier recall on toxic outputs, includes this in model card before production approval
3Retrieval Augmented Generation (RAG) system for legal research logs retrieved case citations (case_id, jurisdiction, relevance_score), when a precedent is later overturned, lineage identifies 1,247 outputs that cited it, system sends notifications to users who received those outputs for review