Change Data Capture (CDC)

Change Data Capture (CDC)

Index

  1. What is CDC?

  2. How It Works (High-Level Architecture)

  3. What Entities Are Streamed

  4. Change Types We Capture

  5. Data Format / Examples

  6. Security & Access Control

  7. What You (the Partner) Need to Do

  8. Benefits for Your Business

  9. Limitations / Considerations

  10. Diagram


1. What is CDC?

Change Data Capture (CDC) is a way to track and stream all changes in your database tables (inserts, updates, deletes) as they happen, and deliver them to downstream systems in real time.

In our implementation, we use Debezium as the CDC engine.


2. How It Works (High-Level Architecture)

Here is a simplified flow of how our CDC system works:

  1. Source Database: Our production MySQL Server is configured with binary logging (binlog).

  2. Debezium Server: We run Debezium Server, which connects to the MySQL binlog and reads all row-level changes (inserts, updates, deletes). This is the core CDC component.

  3. Publishing to Pub/Sub: Debezium publishes the change events directly to Google Cloud Pub/Sub

  4. Cloud Run Function: A Cloud Run function is triggered by the Pub/Sub messages. This function transforms & wraps the events into the format we need (e.g., JSON) and writes them into a GCP Storage Bucket.

  5. Partner’s Infrastructure: The JSON files are stored in a GCP bucket that resides in your infrastructure, to which we will deliver via a service account we provide.


3. What Entities Are Streamed

We currently capture CDC for the following business entities:

  • Account

  • Investment

  • Transaction

These represent the core objects in your system where changes matter most.


4. Change Types We Capture

For each of those entities, we stream all three types of changes:

  • Create (new records)

  • Update (changes to existing records)

  • Delete (records being removed)


5. Data Format / Examples

Entity

Example Payload

Entity

Example Payload

Account

 

Investment

Transaction

 


6. Security & Access Control

Security is a top priority. Here’s how it’s handled:

  • We will provide you with a Google Cloud service account.

  • You need to grant write access to your GCP bucket for that service account, so our Cloud Run function can upload the CDC JSON files into your bucket.

  • Permission is limited: the service account only needs write (or bucket-object-create) rights, not full admin.


7. What You (the Partner) Need to Do

To make this work on your side, you will need to:

  1. Create a GCP bucket in your own Google Cloud project (or identify an existing one).

  2. Allow our service account to have write access to that bucket.


8. Benefits for Your Business

  • Near real-time data: You get up-to-date views of Accounts, Investments, and Transactions as they change.

  • Loose coupling: You don’t need to poll our database — we push changes to you.

  • Scalable: Built on GCP’s serverless and managed infrastructure (Pub/Sub, Cloud Run, Storage).

  • Resilient: Debezium ensures data consistency and keeps track of offset via its own storage. Debezium

  • Secure: Access is limited and auditable via GCP IAM.


9. Limitations / Considerations

  • Schema changes: If we change the schema of our tables (e.g., add or remove columns), that could affect the downstream JSON structure.

  • Latency: While it's near real-time, there may be a small delay (depending on Pub/Sub and Cloud Run).

  • Costs: You bear the cost for the GCP bucket storage, access, and any downstream compute you run on the JSON data.


10. Diagram

Here is a minimal architecture diagram:

MySQL (binlog) Debezium Server Google Cloud Pub/Sub (topic) Cloud Run Function (subscriber) GCP Storage Bucket (in your infra)