Published on

Change Data Capture with Debezium and Kafka: 2025 Production Guide

Authors

๐Ÿ“š Modern CDC Architecture

Greetings, noble seekers of software wisdom! Today, we're about to embark on an enchanting journey through the realm of Change Data Capture (CDC), guided by the valiant Debezium and the swift messenger Kafka, all within the kingdom of SQL Server Database.

๐Ÿ”— Essential Prerequisites

Before diving into CDC, strengthen your foundation:

๐Ÿš€ What's New in 2025

Debezium 2.x Improvements:

  • Incremental snapshots for zero-downtime deployment
  • Multi-tenant database support
  • CloudEvents format native support
  • Vitess and YugabyteDB connectors

Kafka 3.x Features:

  • KRaft mode (no more ZooKeeper!)
  • Exactly-once semantics by default
  • Tiered storage for cost optimization
  • 50% better performance

Cloud-Native Alternatives:

  • AWS DMS with Kinesis Data Streams
  • Azure Event Hubs with Change Feed
  • Google Datastream
  • Confluent Cloud with managed connectors

Alrighty, buckle up and hold onto your hats, folks! We're about to go on a magical journey through the kingdom of Change Data Capture (CDC), where the gallant knights Debezium and Kafka safeguard the land of SQL Server Database. As the grand wizard of software architecture, I am your guide, so let's set off, shall we?

Act 1: Modern Kafka Setup (2025 Edition)

๐Ÿณ Docker Compose for Development

In 2025, we use Kafka without ZooKeeper! Here's a modern setup:

# docker-compose.yml
version: '3.8'
services:
  kafka:
    image: confluentinc/cp-kafka:7.6.0
    hostname: kafka
    container_name: kafka
    ports:
      - "9092:9092"
      - "9101:9101"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
      KAFKA_PROCESS_ROLES: 'broker,controller'
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
      KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_HOST://0.0.0.0:9092'
      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - kafka-data:/var/lib/kafka/data

  schema-registry:
    image: confluentinc/cp-schema-registry:7.6.0
    hostname: schema-registry
    container_name: schema-registry
    depends_on:
      - kafka
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'kafka:29092'

volumes:
  kafka-data:

Act 2: Bringing Debezium to Life

Next, we summon Debezium, our faithful spy who watches our SQL Server Database with the eye of a hawk. Once again, we call upon Docker:

docker run -it --rm --name debezium -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link zookeeper:zookeeper --link kafka:kafka debezium/connect:latest

Act 3: Briefing Debezium

Debezium is a faithful servant but needs to be told exactly what to watch. So we prepare a magical scroll (also known as a JSON file):

{
    "name": "inventory-connector",
    "config": {
        "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
        "tasks.max": "1",
        "database.hostname": "sqlserver",
        "database.port": "1433",
        "database.user": "sa",
        "database.password": "Password!",
        "database.dbname": "testDB",
        "database.server.name": "fulldb",
        "table.include.list": "dbo.customers",
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "dbhistory.full" ,
        "include.schema.changes": "true"
    }
}

And deliver it using a magical raven (or a REST API call, if you will):

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" localhost:8083/connectors/ -d @connector.json

Act 4: Wreaking Havoc (Testing!)

Let's see our Debezium-Kafka combo in action! We'll create a table in our SQL Server Database and then change it:

USE testDB;
GO

UPDATE dbo.customers SET email = 'j.doe@example.com' WHERE id = 1;
GO

INSERT INTO dbo.customers
VALUES (3, 'Jimmy', 'Doe', 'jimmy.doe@example.com');
GO

Act 5: Reading the Tea Leaves

Finally, we'll use Kafka's consumer to read the magical scrolls (i.e., change data):

docker exec -it kafka /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic fulldb.dbo.customers --from-beginning

Voila! Debezium has spied the changes and Kafka has carried the news. If all's gone well, you'll see something like this:

{
    "schema": { ... },
    "payload": {
        "before": {
            "id": 1,
            "first_name": "John",
            "last_name": "Doe",
            "email": "john.doe@example.com"
        },
        "after": {
            "id": 1,
            "first_name": "John",
            "last_name": "Doe",
            "email": "j.doe@example.com"
        },
        "source": { ... },
        "op": "u",
        "ts_ms": 1234567890123,
        "transaction": null
    }
}

And for the new entry:

{
    "schema": { ... },
    "payload": {
        "before": null,
        "after": {
            "id": 3,
            "first_name": "Jimmy",
            "last_name": "Doe",
            "email": "jimmy.doe@example.com"
        },
        "source": { ... },
        "op": "c",
        "ts_ms": 1234567890124,
        "transaction": null
    }
}

Bask in the glory of your magical prowess! You've successfully implemented a CDC solution with Debezium and Kafka on SQL Server Database.


๐Ÿš€ Production Best Practices (2025)

1. Schema Evolution Strategy

{
  "key.converter": "io.confluent.connect.avro.AvroConverter",
  "key.converter.schema.registry.url": "http://schema-registry:8081",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://schema-registry:8081",
  "value.converter.schema.compatibility": "BACKWARD"
}

2. Exactly-Once Semantics

{
  "exactly.once.support": "required",
  "transaction.boundary": "poll",
  "producer.enable.idempotence": "true"
}

3. Kubernetes Deployment

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: debezium-sqlserver-connector
  labels:
    strimzi.io/cluster: my-connect-cluster
spec:
  class: io.debezium.connector.sqlserver.SqlServerConnector
  tasksMax: 3
  config:
    database.hostname: sqlserver.database.svc.cluster.local
    database.port: 1433
    database.user: ${file:/opt/kafka/external-configuration/connector-config/connector.properties:database.user}
    database.password: ${file:/opt/kafka/external-configuration/connector-config/connector.properties:database.password}
    snapshot.mode: incremental
    signal.data.collection: testDB.debezium_signal

4. Monitoring & Observability

  • Metrics: JMX + Prometheus + Grafana
  • Tracing: OpenTelemetry integration
  • Alerting: PagerDuty for lag > 5 minutes

5. Common Pitfalls & Solutions

| Problem | Solution | |---------|----------| | Large initial snapshots | Use incremental snapshots with signaling | | Schema changes break consumers | Implement schema registry with compatibility modes | | Connector failures | Use distributed mode with task rebalancing | | Data inconsistency | Enable read-only replica for CDC | | High latency | Optimize batch sizes and parallelism |


๐ŸŒŸ Modern Alternatives & When to Use Them

Cloud-Native CDC Solutions

AWS DMS + Kinesis

  • Best for: AWS-native architectures
  • Pros: Managed service, automatic scaling
  • Cons: Vendor lock-in, limited transformations

Azure Event Hubs Change Feed

  • Best for: Cosmos DB and Azure SQL
  • Pros: Integrated with Azure ecosystem
  • Cons: Limited to Azure databases

Google Datastream

  • Best for: BigQuery real-time analytics
  • Pros: Zero-downtime migrations
  • Cons: Limited source database support

When to Still Choose Debezium (2025)

  • Multi-cloud or on-premises deployments
  • Need for complex transformations
  • Open-source requirements
  • Multiple heterogeneous databases
  • Fine-grained control over CDC process

๐Ÿš€ Next Steps in Your CDC Journey

Continue Learning:

Advanced Topics:

  • Event Sourcing: Build audit logs from CDC events
  • CQRS Implementation: Separate read/write models
  • Outbox Pattern: Ensure transactional consistency
  • Change Stream Processing: Real-time analytics

Resources:

Now, go forth and use your newfound powers wisely, my apprentice. And remember, "with great power, comes great responsibility." ๐Ÿง™โ€โ™‚๏ธ

Updated for 2025: This guide now includes KRaft mode, incremental snapshots, cloud alternatives, and production lessons from real-world deployments.