- Published on
Change Data Capture with Debezium and Kafka: 2025 Production Guide
- Authors
- Name
- Gary Huynh
- @gary_atruedev
๐ Modern CDC Architecture
Greetings, noble seekers of software wisdom! Today, we're about to embark on an enchanting journey through the realm of Change Data Capture (CDC), guided by the valiant Debezium and the swift messenger Kafka, all within the kingdom of SQL Server Database.
๐ Essential Prerequisites
Before diving into CDC, strengthen your foundation:
- ๐๏ธ Microservices Architecture - Where CDC shines
- ๐ Real-time Event Processing - Kafka fundamentals
- ๐งน Clean Event Design - Event schema best practices
- ๐ก๏ธ Secure Your Streams - Production security
๐ What's New in 2025
Debezium 2.x Improvements:
- Incremental snapshots for zero-downtime deployment
- Multi-tenant database support
- CloudEvents format native support
- Vitess and YugabyteDB connectors
Kafka 3.x Features:
- KRaft mode (no more ZooKeeper!)
- Exactly-once semantics by default
- Tiered storage for cost optimization
- 50% better performance
Cloud-Native Alternatives:
- AWS DMS with Kinesis Data Streams
- Azure Event Hubs with Change Feed
- Google Datastream
- Confluent Cloud with managed connectors
Alrighty, buckle up and hold onto your hats, folks! We're about to go on a magical journey through the kingdom of Change Data Capture (CDC)
, where the gallant knights Debezium
and Kafka
safeguard the land of SQL Server Database
. As the grand wizard of software architecture
, I am your guide, so let's set off, shall we?
Act 1: Modern Kafka Setup (2025 Edition)
๐ณ Docker Compose for Development
In 2025, we use Kafka without ZooKeeper! Here's a modern setup:
# docker-compose.yml
version: '3.8'
services:
kafka:
image: confluentinc/cp-kafka:7.6.0
hostname: kafka
container_name: kafka
ports:
- "9092:9092"
- "9101:9101"
environment:
KAFKA_NODE_ID: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
KAFKA_PROCESS_ROLES: 'broker,controller'
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_HOST://0.0.0.0:9092'
KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
volumes:
- kafka-data:/var/lib/kafka/data
schema-registry:
image: confluentinc/cp-schema-registry:7.6.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- kafka
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'kafka:29092'
volumes:
kafka-data:
Act 2: Bringing Debezium to Life
Next, we summon Debezium
, our faithful spy
who watches our SQL Server Database
with the eye of a hawk. Once again, we call upon Docker
:
docker run -it --rm --name debezium -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link zookeeper:zookeeper --link kafka:kafka debezium/connect:latest
Act 3: Briefing Debezium
Debezium
is a faithful servant but needs to be told exactly what to watch. So we prepare a magical scroll (also known as a JSON
file):
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "sqlserver",
"database.port": "1433",
"database.user": "sa",
"database.password": "Password!",
"database.dbname": "testDB",
"database.server.name": "fulldb",
"table.include.list": "dbo.customers",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.full" ,
"include.schema.changes": "true"
}
}
And deliver it using a magical raven (or a REST API
call, if you will):
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d @connector.json
Act 4: Wreaking Havoc (Testing!)
Let's see our Debezium-Kafka
combo in action! We'll create a table in our SQL Server Database
and then change it:
USE testDB;
GO
UPDATE dbo.customers SET email = 'j.doe@example.com' WHERE id = 1;
GO
INSERT INTO dbo.customers
VALUES (3, 'Jimmy', 'Doe', 'jimmy.doe@example.com');
GO
Act 5: Reading the Tea Leaves
Finally, we'll use Kafka's consumer
to read the magical scrolls (i.e., change data):
docker exec -it kafka /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic fulldb.dbo.customers --from-beginning
Voila! Debezium
has spied the changes and Kafka
has carried the news. If all's gone well, you'll see something like this:
{
"schema": { ... },
"payload": {
"before": {
"id": 1,
"first_name": "John",
"last_name": "Doe",
"email": "john.doe@example.com"
},
"after": {
"id": 1,
"first_name": "John",
"last_name": "Doe",
"email": "j.doe@example.com"
},
"source": { ... },
"op": "u",
"ts_ms": 1234567890123,
"transaction": null
}
}
And for the new entry:
{
"schema": { ... },
"payload": {
"before": null,
"after": {
"id": 3,
"first_name": "Jimmy",
"last_name": "Doe",
"email": "jimmy.doe@example.com"
},
"source": { ... },
"op": "c",
"ts_ms": 1234567890124,
"transaction": null
}
}
Bask in the glory of your magical prowess! You've successfully implemented a CDC solution
with Debezium
and Kafka
on SQL Server Database
.
๐ Production Best Practices (2025)
1. Schema Evolution Strategy
{
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter.schema.compatibility": "BACKWARD"
}
2. Exactly-Once Semantics
{
"exactly.once.support": "required",
"transaction.boundary": "poll",
"producer.enable.idempotence": "true"
}
3. Kubernetes Deployment
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: debezium-sqlserver-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: io.debezium.connector.sqlserver.SqlServerConnector
tasksMax: 3
config:
database.hostname: sqlserver.database.svc.cluster.local
database.port: 1433
database.user: ${file:/opt/kafka/external-configuration/connector-config/connector.properties:database.user}
database.password: ${file:/opt/kafka/external-configuration/connector-config/connector.properties:database.password}
snapshot.mode: incremental
signal.data.collection: testDB.debezium_signal
4. Monitoring & Observability
- Metrics: JMX + Prometheus + Grafana
- Tracing: OpenTelemetry integration
- Alerting: PagerDuty for lag > 5 minutes
5. Common Pitfalls & Solutions
| Problem | Solution | |---------|----------| | Large initial snapshots | Use incremental snapshots with signaling | | Schema changes break consumers | Implement schema registry with compatibility modes | | Connector failures | Use distributed mode with task rebalancing | | Data inconsistency | Enable read-only replica for CDC | | High latency | Optimize batch sizes and parallelism |
๐ Modern Alternatives & When to Use Them
Cloud-Native CDC Solutions
AWS DMS + Kinesis
- Best for: AWS-native architectures
- Pros: Managed service, automatic scaling
- Cons: Vendor lock-in, limited transformations
Azure Event Hubs Change Feed
- Best for: Cosmos DB and Azure SQL
- Pros: Integrated with Azure ecosystem
- Cons: Limited to Azure databases
Google Datastream
- Best for: BigQuery real-time analytics
- Pros: Zero-downtime migrations
- Cons: Limited source database support
When to Still Choose Debezium (2025)
- Multi-cloud or on-premises deployments
- Need for complex transformations
- Open-source requirements
- Multiple heterogeneous databases
- Fine-grained control over CDC process
๐ Next Steps in Your CDC Journey
Continue Learning:
- ๐๏ธ Build Event-Driven Microservices
- ๐ Implement Real-time Notifications
- ๐ก๏ธ Secure Your Event Streams
- ๐งช Test Event-Driven Systems
Advanced Topics:
- Event Sourcing: Build audit logs from CDC events
- CQRS Implementation: Separate read/write models
- Outbox Pattern: Ensure transactional consistency
- Change Stream Processing: Real-time analytics
Resources:
- ๐ Debezium Documentation
- ๐ฅ Kafka Streams in Action
- ๐ฌ CDC Community Forum
Now, go forth and use your newfound powers wisely, my apprentice. And remember, "with great power, comes great responsibility." ๐งโโ๏ธ
Updated for 2025: This guide now includes KRaft mode, incremental snapshots, cloud alternatives, and production lessons from real-world deployments.