If you are familiar with messaging systems like Kafka and RocketMQ, you may know that services are typically closely related to storage in their architectures. Different from them, Apache Pulsar is designed with a two-layer architecture that separates storage from compute, which actually happens on its stateless brokers. Pulsar relies on Apache BookKeeper servers for persistent storage, also known as bookies. This blog focuses on the basics of BookKeeper and illustrates how it works to achieve high availability for the data it handles.
What is Apache BookKeeper
Originally developed at Yahoo, BookKeeper represents a reliable, high-performance storage system. It provides distributed, scalable storage services, featuring low latency and strong fault tolerance. These speak volumes about why it is capable of serving as Pulsar’s storage layer. BookKeeper stores data in ledgers, which are append-only and immutable. With a special replication protocol, BookKeeper stores log entries securely across multiple nodes in a concurrent way, which are highly available.