Beyond Bytes: Understanding the Four Types of Kafka Messages

Kafka's flexibility is a double-edged sword. This guide introduces the four primary message types: States, Deltas, Events, and Commands. It shows you how to design your data streams for purpose, leading to robust and scalable architectures.

Apache Kafka offers immense flexibility. It can handle any type of data, from JSON-Objects, Avro, or even custom binary formats. But with this flexibility comes a crucial architectural challenge: you, the developer or architect, must define what kind of data is flowing through your topics.

Without a clear strategy, your data streams can quickly become a chaotic mess of unstructured bytes, leading to inconsistent data, fragile systems, and endless debugging. This guide provides a framework for defining your Kafka messages based on their purpose. We will explore the four primary message types: States, Deltas, Events, and Commands and show how a conscious choice leads to more robust, scalable, and understandable data architectures.

What Is a Kafka Message, Really?

At its core, a Kafka message, or record, is nothing more than a simple byte array. Kafka is built for performance and doesn’t care about the content of your message. It’s up to you to interpret it. The message itself consists of an optional key, a value (the payload), an optional timestamp, and a set of custom headers for technical metadata.

While Kafka can handle any byte array, it’s optimized for a high volume of small messages, typically staying under 1MB. Trying to push large files through Kafka can lead to poor performance and should be avoided.

Now, let’s explore the four ways you can use this byte array to design your data streams.

The Four Pillars of Message Design

The following types are not mutually exclusive; a single system often uses a mix of them to achieve its goals. Let’s use a practical scenario to illustrate each type: a user updates their residential address in a master data service.

1. States: The "What Is" Message

A State message represents the complete, current snapshot of an entity. It’s a full picture of the object at a specific point in time, containing all of its attributes.

  • Scenario: When a user’s address changes, a user.state topic might receive a full user object with the new address.

  • Purpose: This type is invaluable when you simply want to have the latest state of an entity. Maybe you want to push the latest state of a user to a database of another service. Then this message is perfect for you.

{
  "userId": "user-12345",
  "name": "Jane Doe",
  "email": "jane.doe@example.com",
  "address": {
    "street": "123 Main Street",
    "city": "Anytown",
    "zip": "12345"
  }
}
  • Pros: Simple to process, as each message is self-contained. New consumers can quickly get the full picture.

  • Cons: High data volume, especially for large objects with frequent, small changes. It’s difficult to understand what exactly changed and why, because the message only contains the final state.

2. Deltas: The "What Has Changed" Message

A Delta message contains only the attributes of an entity that have changed. It is a partial update, representing a difference between the old state and the new state.

  • Scenario: Instead of sending the entire user object, the master data service sends a user.address.changes message containing just the user ID and the updated address.

  • Purpose: Delta messages are highly efficient for frequent updates where only a few fields change. A consuming service instantly knows what has changed and can act accordingly. If it has the previous state of the user, it can easily update it.

{
  "userId": "user-12345",
  "address": {
    "street": "123 Main Street",
    "city": "Anytown",
    "zip": "12345"
  }
}
  • Pros: Higher efficiency due to minimal data volume. A clear and concise representation of what has changed.

  • Cons: Requires the consumer to maintain a local State to build a full picture. If a message is missed, the local state will be incorrect. It also provides no insight into why the change happened. Was the user correcting a typo, or did they actually move?

Deltas can lead to data inconsistencies if a consuming application misses a message.

3. Events: The "What Happened" Message

An Event message describes a specific action or occurrence relevant for the business. It often includes contextual information beyond just the data that changed.

  • Scenario: When a user actually moves, the master data service emits a user.moved event to an auditing topic. If your backoffice simply changes a typo, it might emit an user.corrected event. The event payload includes the ID of the changed entity, the event type and the payload. The payload is usually either a state message or a delta message.

  • Purpose: Events are the foundation of event-driven architectures. They provide an immutable audit log of business actions. Other services can react to this event without being tightly coupled to the producer. For instance, an audit system can log the event, or the relocation-postcard service can send a postcard to the user when the actual move happens.

{
  "event": "user.moved",
  "userId": "user-12345",
  "payload": {
    "address": {
      "street": "123 Main Street",
      "city": "Anytown",
      "zip": "12345"
    }
  }
}
  • Pros: Provides a semantic meaning. You not only know what changed, but also why something happened, which is crucial for complex business logic.

  • Cons: Can lead to "event-overloading" if you create a separate event for every minor action. A consuming application needs its own logic to process the event and update its state, which is more complex than just applying a State message.

4. Commands: The "What to Do" Message

A Command message is an explicit instruction for another service to perform a specific action. Unlike an event, it is future-oriented and directed at a single recipient.

  • Scenario: After consuming the user.moved event, a "Relocation Postcard Service" processes the change and produces a Command to a postcard.sender topic. The command instructs the postcard service to send.postcard with the user’s new address and a personalized message.

  • Purpose: Commands are used for direct, explicit communication between services. The target service can focus on its job without needing to understand the underlying business reason ("the why") behind the action. It simply executes the command.

{
  "command": "send.postcard",
  "name": "John Doe",
  "address": {
    "street": "123 Main Street",
    "city": "Anytown",
    "zip": "12345"
  },
  "message": "Welcome to your new home!",
  "image-url": "https://example.com/postcard.jpg"
}
  • Pros: Explicit and direct communication. The purpose is clear and the receiving service’s logic is straightforward. The receiving service does not need to understand the underlying business reason ("the why") behind the action.

  • Cons: Creates a tighter coupling between producer and consumer. A new consumer cannot simply reuse a command for a different purpose, making it less flexible than an event.

Conclusion

The choice of message type is a foundational architectural decision, not a mere technical detail. As we have seen with our user address change example, a single business action can be represented in multiple ways, each serving a different purpose.

  • States are perfect for capturing the complete current picture of an entity.

  • Deltas are ideal for efficient, partial updates to keep data fresh.

  • Events are the bedrock of decoupled, event-driven systems, telling you what happened.

  • Commands are for explicit, directed instructions, telling a service what to do.

By consciously choosing the right message type for each business process, you can build systems that are not only performant and scalable but also clear, maintainable, and robust.

About Anatoly Zelenin
Hi, I’m Anatoly! I love to spark that twinkle in people’s eyes. As an Apache Kafka expert and book author, I’ve been bringing IT to life for over a decade—with passion instead of boredom, with real experiences instead of endless slides.

Continue reading

article-image
Apache Kafka at a Glance

Companies leverage Apache Kafka to make decisions in near-real-time. This data hub and event streaming platform enables answers at the moment questions arise, helping organizations shift from waiting to acting

Read more
article-image
Apache Kafka in Action

Apache Kafka in Action: From basics to production guides you through the concepts and skills you'll need to deploy and administer Kafka for data pipelines, event-driven applications, and other systems that process data streams from multiple sources.

Read more