[Recap] Himeji: A Scalable Centralized System for Authorization at Airbnb
How Himeji efficiently manages billions of relations and serves nearly a million entity authorizations per second.
Reference:
The Problem
Airbnb engineering transitioned from a monolithic Ruby on Rails architecture to a service-oriented architecture (SOA) over the past few years.
In the Rails setup, each resource had its own API with authorization checks to safeguard sensitive data, which was straightforward to manage due to the single access point for each resource's data.
However, with the shift to SOA, a layered structure was introduced, comprising data services enveloping databases and presentation services drawing from multiple data services. Initially, the permission checks were transferred from the monolith to the presentation services in the SOA setup, which lead to problems:
Duplicate and difficult to manage authorization checks
Multiple presentation services accessing the same data often had duplicate authorization check code. This sometimes resulted in checks becoming out of sync and challenging to manage.
Fan out to multiple services
Most authorization checks necessitated calling into other services. This was slow, the load was difficult to maintain, and it impacted overall performance and reliability.
Solution
To address the issues above, Airbnb engineering implemented two major changes:
Relocation of authorization checks:
Authorization checks were moved to data services, rather than being solely performed in presentation services. => This action helped to eliminate issues of duplicate and inconsistent checks.
Introduction of Himeji:
A centralized authorization system called Himeji was created, based on Zanzibar.
It operates from the data layer, storing permission data and executing checks as a central source of truth.
API
Himeji exposes a check API
// Can the principal do relation on entity?
boolean check(entity, relation, principal)
Example:
// can user 123 write to listing 10’s description?
check(entity: “LISTING : 10 : DESCRIPTION”,
relation: WRITE,
principal: User(123))
Storage
Similar to Zanzibar, the basic unit of storage for Himeji is a tuple in the form entity # relation @ principal.
Entity is a triple (entity_id: entity_type: entity_part)
Relation describes the relationship
Principal is either a user identity or another entity
Configuration
Airbnb engineering adopted a YAML-based configuration language, based on the Zanzibar setup, which enables the resolution of permission checks through set algebra. This system allows a developer to map a check to a set operation.
LISTING:
'#WRITE':
union:
- '#WRITE'
- '#OWNER'
'#READ':
union:
- '#READ'
- '#WRITE'
Example of query
Database
Query:
check(entity: "LISTING : 10", relation: WRITE, userId: 123)
Himeji interprets
relation: WRITE = UNION (WRITE, OWNER)
Himeji makes two queries
Query LISTING : 10 # WRITE @ User(123) => Empty Query LISTING : 10 # OWNER @ User(123) => Match User(123)
The result is True
References
Airbnb often sees scenarios where one entity grants access to another simply by existing
For example, a guest of a reservation gains access to a listing’s location, along with other pieces of the listing’s information.
LISTING: LOCATION: '#READ': union: - #OWNER - LISTING : $id # RESERVATION @ Reference(RESERVATION : $reservationId # GUEST)
Query:
check(LISTING : 10 : LOCATION # READ, User(456))
Himeji makes two queries
Query LISTING : 10 # OWNER @ User(456) => Empty Query LISTING : 10 # RESERVATION => Match Reference(RESERVATION:500) Query RESERVATION : 500 # GUEST @ User(456) => Match User(456)
The result is True
Architecture & Performance
Himeji is split into three layers:
Orchestration layer:
This layer receives client requests, issues fetches for data based on configuration logic, and parses the results.
It uses consistent hashing to route to the caching layer.
Caching layer:
This sharded and replicated layer (one instance per Availability Zone per shard) is responsible for in-memory filtering and deduplication of database loads on misses.
Each shard owns a set of data assigned via consistent hashing, aiming for a ~98% cache hit rate.
Data layer: This layer consists of logically sharded databases.
Significant modifications were made to Himeji over Zanzibar's setup to:
Separate the request orchestration and caching tiers: This allows for easier updates to the orchestration tier without needing to restart the cache.
Invalidate cache shards based on database mutations: The cache shards are updated following any changes in the databases.
Himeji uses SpinalTap; a scalable, performant, reliable, lossless Change Data Capture service capable of detecting data mutations with low latency across different data source types, and propagating them as standardized events to consumers downstream
Use Amazon Aurora for database storage: This is part of Airbnb's transition to cloud-based solutions, contrasting Zanzibar's use of Spanner. The same reliability (hedging, tiered caching) and load shedding features as Zanzibar are maintained for availability.
Himeji has been operational in production for about a year. Its throughput has scaled up to 850k entities per second in March 2021.
Availability 99.9990% P50 Latency 1.8 ms P95 Latency 7 ms P99 Latency 12 ms
Tooling
To reduce integration time and encourage developer adoption, several tools were developed:
Configuration-based backfill: To migrate existing permission checks into Himeji, a generic solution was created using Apache Airflow and Apache Spark for backfilling permission tuples for existing entities. Service owners only need to provide a small configuration, indicating how their tuple should be formed from their database exports.
Automatic code generation: Scripts were provided to auto-generate Java and Scala code, simplifying the onboarding process.
Thick client: A thick HTTP client was provided, equipped with logging, metrics, and migration rollout controls.
UI tool for debugging and one-off tasks: A user interface was developed for data analysis and fixing permissions issues, making it easier to investigate one-off permission problems by checking permission data written in the system.
Conclusion
Himeji, Airbnb's Zanzibar-based authorization system, centralizes all product and data authorization, addressing previous challenges of maintaining consistency and performance.
It uses a simple data model and flexible logic configuration, enhancing Zanzibar's scalability and performance.
Himeji's tiered distributed cache lowers latencies, enabling it to store tens of billions of relations and serve nearly a million entity authorizations per second with high availability and low latency.