Data Governance & LineageFine-grained Access Control & PoliciesHard⏱️ ~3 min

FGAC Failure Modes and Edge Cases

Policy Gaps Across Systems: The most common and dangerous failure mode is enforcing row level security in the warehouse but forgetting other data access paths. Imagine you enforce strict FGAC in your data warehouse: customer support agents only see tickets from their assigned region. But then you build a vector search index in OpenSearch from those same tickets to power semantic search. If you load embeddings without preserving row level policies, agents can search the index and see tickets they are explicitly denied in the warehouse. This bypass is subtle because the vector store and warehouse appear logically separate, but they expose the same underlying data.
❗ Remember: AWS explicitly warns about this when discussing generative AI and stresses applying FGAC at the vector store level, not just the source warehouse.
Similarly, if users can read underlying object storage like S3 or Google Cloud Storage (GCS) directly through misconfigured Identity and Access Management (IAM) policies, they bypass compute layer controls entirely. Virtual Private Cloud (VPC) service perimeters and bucket level policies close these holes, but misconfiguration is common. Backups and offline exports create additional bypass risks. Predicate Explosion and Performance Collapse: In predicate based row level security, a user belonging to many groups can generate extremely long predicates. Imagine a user in 200 security groups where each group grants access to certain customers. The naive implementation generates WHERE customer_id IN (1,2,3...10000) with thousands of terms. This prevents index usage and forces full table scans. At scale, this can push cluster Central Processing Unit (CPU) utilization to 100 percent and violate latency Service Level Objectives (SLOs) for all tenants. Some systems cap the maximum number of terms or require pre computing a permission table to avoid this failure mode.
Performance Degradation
NORMAL
5 sec
PREDICATE EXPLOSION
80 sec
Stale Policy Cache Race Conditions: Caching policy decisions for performance creates a window where revoked access still works. With a 5 minute cache Time To Live (TTL), a fired employee retains access for up to 5 minutes after termination. At 10,000 rows per second export rate, that is 3 million rows potentially exfiltrated. The mitigation is immediate cache invalidation on critical events like termination. But this requires tight integration between human resources systems, identity providers, and policy engines. Failures in this chain create security gaps. Multi Tenant Filter Bugs: In Software as a Service (SaaS) scenarios, a single bug in tenant filters can leak data across customers. This is catastrophic. The worst case is a missing AND tenant_id = current_tenant predicate on a 100 billion row fact table, exposing millions of customer records. The defense is exhaustive testing. Unit tests validate predicates for synthetic tenants. Integration tests verify isolation under concurrent load. Contract tests ensure that schema changes do not accidentally drop filter columns. Despite all this, production bugs still occur, which is why audit logging and anomaly detection are critical backstops. Ambiguous or Conflicting Policies: Combining multiple policy sources creates unexpected interactions. Global policies, team policies, and ad hoc exceptions can conflict. The standard principle is deny overrides allow, but misordered or overlapping rules can either leak data or break legitimate use cases. Strong tooling for policy simulation becomes essential. Before deploying a new policy, you run it against historical query logs to see what would have changed. An explain why this query was denied feature helps debug false denials.
💡 Key Takeaways
Policy gaps across systems are the most common failure: enforcing FGAC in warehouse but not in vector search, backups, or object storage creates bypass paths
Users in many groups can generate predicates with thousands of terms, preventing index use and causing full table scans that collapse performance
Policy cache with 5 minute TTL gives terminated employees enough time to exfiltrate 3 million rows at typical export rates
Multi tenant filter bugs on large fact tables can expose millions of records across customers; exhaustive testing is mandatory
Conflicting policies from multiple sources require simulation tools and explain why denied features to debug safely
📌 Examples
1Support tickets enforce row level security in Snowflake but vector embeddings in OpenSearch lack filters, allowing agents to search restricted tickets
2User in 200 security groups generates WHERE clause with 5000 customer IDs, query degrades from 5 seconds to 80 seconds due to full scan
3Developer forgets tenant filter on analytics query, exposing competitor sales data across SaaS customers for 2 hours before detection
← Back to Fine-grained Access Control & Policies Overview
FGAC Failure Modes and Edge Cases | Fine-grained Access Control & Policies - System Overflow