📚 Modern CDC Architecture

Greetings, noble seekers of software wisdom! Today, we're about to embark on an enchanting journey through the realm of Change Data Capture (CDC), guided by the valiant Debezium and the swift messenger Kafka, all within the kingdom of SQL Server Database.

🔗 Essential Prerequisites

Before diving into CDC, strengthen your foundation:

🏗️ Microservices Architecture - Where CDC shines
🔔 Real-time Event Processing - Kafka fundamentals
🧹 Clean Event Design - Event schema best practices
🛡️ Secure Your Streams - Production security

🚀 What's New in 2025

Debezium 2.x Improvements:

Incremental snapshots for zero-downtime deployment
Multi-tenant database support
CloudEvents format native support
Vitess and YugabyteDB connectors

Kafka 3.x Features:

KRaft mode (no more ZooKeeper!)
Exactly-once semantics by default
Tiered storage for cost optimization
50% better performance

Cloud-Native Alternatives:

AWS DMS with Kinesis Data Streams
Azure Event Hubs with Change Feed
Google Datastream
Confluent Cloud with managed connectors

Alrighty, buckle up and hold onto your hats, folks! We're about to go on a magical journey through the kingdom of Change Data Capture (CDC), where the gallant knights Debezium and Kafka safeguard the land of SQL Server Database. As the grand wizard of software architecture, I am your guide, so let's set off, shall we?

Act 1: Modern Kafka Setup (2025 Edition)

🐳 Docker Compose for Development

In 2025, we use Kafka without ZooKeeper! Here's a modern setup:

# docker-compose.yml
version: '3.8'
services:
  kafka:
    image: confluentinc/cp-kafka:7.6.0
    hostname: kafka
    container_name: kafka
    ports:
      - "9092:9092"
      - "9101:9101"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
      KAFKA_PROCESS_ROLES: 'broker,controller'
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
      KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_HOST://0.0.0.0:9092'
      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - kafka-data:/var/lib/kafka/data

  schema-registry:
    image: confluentinc/cp-schema-registry:7.6.0
    hostname: schema-registry
    container_name: schema-registry
    depends_on:
      - kafka
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'kafka:29092'

volumes:
  kafka-data:

Act 2: Bringing Debezium to Life

Next, we summon Debezium, our faithful spy who watches our SQL Server Database with the eye of a hawk. Once again, we call upon Docker:

docker run -it --rm --name debezium -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link zookeeper:zookeeper --link kafka:kafka debezium/connect:latest

Act 3: Briefing Debezium

Debezium is a faithful servant but needs to be told exactly what to watch. So we prepare a magical scroll (also known as a JSON file):

{
    "name": "inventory-connector",
    "config": {
        "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
        "tasks.max": "1",
        "database.hostname": "sqlserver",
        "database.port": "1433",
        "database.user": "sa",
        "database.password": "Password!",
        "database.dbname": "testDB",
        "database.server.name": "fulldb",
        "table.include.list": "dbo.customers",
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "dbhistory.full" ,
        "include.schema.changes": "true"
    }
}

And deliver it using a magical raven (or a REST API call, if you will):

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" localhost:8083/connectors/ -d @connector.json

Act 4: Wreaking Havoc (Testing!)

Let's see our Debezium-Kafka combo in action! We'll create a table in our SQL Server Database and then change it:

USE testDB;
GO

UPDATE dbo.customers SET email = 'j.doe@example.com' WHERE id = 1;
GO

INSERT INTO dbo.customers
VALUES (3, 'Jimmy', 'Doe', 'jimmy.doe@example.com');
GO

Act 5: Reading the Tea Leaves

Finally, we'll use Kafka's consumer to read the magical scrolls (i.e., change data):

docker exec -it kafka /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic fulldb.dbo.customers --from-beginning

Voila! Debezium has spied the changes and Kafka has carried the news. If all's gone well, you'll see something like this:

{
    "schema": { ... },
    "payload": {
        "before": {
            "id": 1,
            "first_name": "John",
            "last_name": "Doe",
            "email": "john.doe@example.com"
        },
        "after": {
            "id": 1,
            "first_name": "John",
            "last_name": "Doe",
            "email": "j.doe@example.com"
        },
        "source": { ... },
        "op": "u",
        "ts_ms": 1234567890123,
        "transaction": null
    }
}

And for the new entry:

{
    "schema": { ... },
    "payload": {
        "before": null,
        "after": {
            "id": 3,
            "first_name": "Jimmy",
            "last_name": "Doe",
            "email": "jimmy.doe@example.com"
        },
        "source": { ... },
        "op": "c",
        "ts_ms": 1234567890124,
        "transaction": null
    }
}

Bask in the glory of your magical prowess! You've successfully implemented a CDC solution with Debezium and Kafka on SQL Server Database.

🚀 Production Best Practices (2025)

1. Schema Evolution Strategy

{
  "key.converter": "io.confluent.connect.avro.AvroConverter",
  "key.converter.schema.registry.url": "http://schema-registry:8081",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter.schema.registry.url": "http://schema-registry:8081",
  "value.converter.schema.compatibility": "BACKWARD"
}

2. Exactly-Once Semantics

{
  "exactly.once.support": "required",
  "transaction.boundary": "poll",
  "producer.enable.idempotence": "true"
}

3. Kubernetes Deployment

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: debezium-sqlserver-connector
  labels:
    strimzi.io/cluster: my-connect-cluster
spec:
  class: io.debezium.connector.sqlserver.SqlServerConnector
  tasksMax: 3
  config:
    database.hostname: sqlserver.database.svc.cluster.local
    database.port: 1433
    database.user: ${file:/opt/kafka/external-configuration/connector-config/connector.properties:database.user}
    database.password: ${file:/opt/kafka/external-configuration/connector-config/connector.properties:database.password}
    snapshot.mode: incremental
    signal.data.collection: testDB.debezium_signal

4. Monitoring & Observability

Metrics: JMX + Prometheus + Grafana
Tracing: OpenTelemetry integration
Alerting: PagerDuty for lag > 5 minutes

5. Common Pitfalls & Solutions

| Problem | Solution | |---------|----------| | Large initial snapshots | Use incremental snapshots with signaling | | Schema changes break consumers | Implement schema registry with compatibility modes | | Connector failures | Use distributed mode with task rebalancing | | Data inconsistency | Enable read-only replica for CDC | | High latency | Optimize batch sizes and parallelism |

🌟 Modern Alternatives & When to Use Them

Cloud-Native CDC Solutions

AWS DMS + Kinesis

Best for: AWS-native architectures
Pros: Managed service, automatic scaling
Cons: Vendor lock-in, limited transformations

Azure Event Hubs Change Feed

Best for: Cosmos DB and Azure SQL
Pros: Integrated with Azure ecosystem
Cons: Limited to Azure databases

Google Datastream

Best for: BigQuery real-time analytics
Pros: Zero-downtime migrations
Cons: Limited source database support

When to Still Choose Debezium (2025)

Multi-cloud or on-premises deployments
Need for complex transformations
Open-source requirements
Multiple heterogeneous databases
Fine-grained control over CDC process

🚀 Next Steps in Your CDC Journey

Continue Learning:

Advanced Topics:

Event Sourcing: Build audit logs from CDC events
CQRS Implementation: Separate read/write models
Outbox Pattern: Ensure transactional consistency
Change Stream Processing: Real-time analytics

Resources:

Now, go forth and use your newfound powers wisely, my apprentice. And remember, "with great power, comes great responsibility." 🧙‍♂️

Updated for 2025: This guide now includes KRaft mode, incremental snapshots, cloud alternatives, and production lessons from real-world deployments.