Sign In Try Free

TiCDC Overview

TiCDCis a tool for replicating the incremental data of TiDB. This tool is implemented by pulling TiKV change logs. It can restore data to a consistent state with any upstream TSO, and providesTiCDC Open Protocolto support other systems to subscribe to data changes.

TiCDC Architecture

When TiCDC is running, it is a stateless node that achieves high availability through etcd in PD. The TiCDC cluster supports creating multiple replication tasks to replicate data to multiple different downstream platforms.

The architecture of TiCDC is shown in the following figure:

TiCDC architecture

System roles

  • TiKV CDC component: Only outputs key-value (KV) change logs.

    • Assembles KV change logs in the internal logic.
    • Provides the interface to output KV change logs. The data sent includes real-time change logs and incremental scan change logs.
  • capture: The operating process of TiCDC. Multiplecaptures form a TiCDC cluster that replicates KV change logs.

    • 每一个capturepulls a part of KV change logs.
    • Sorts the pulled KV change log(s).
    • Restores the transaction to downstream or outputs the log based on the TiCDC open protocol.

Replication features

This section introduces the replication features of TiCDC.

Sink support

铜rrently, the TiCDC sink component supports replicating data to the following downstream platforms:

  • Databases compatible with MySQL protocol. The sink component provides the final consistency support.
  • Kafka based on the TiCDC Open Protocol. The sink component ensures the row-level order, final consistency or strict transactional consistency.

Ensure replication order and consistency

Replication order

  • For all DDL or DML statements, TiCDC outputs themat least once

  • When the TiKV or TiCDC cluster encounters failure, TiCDC might send the same DDL/DML statement repeatedly. For duplicated DDL/DML statements:

    • MySQL sink can execute DDL statements repeatedly. For DDL statements that can be executed repeatedly in the downstream, such astruncate table, the statement is executed successfully. For those that cannot be executed repeatedly, such ascreate table, the execution fails, and TiCDC ignores the error and continues the replication.
    • Kafka sink sends messages repeatedly, but the duplicate messages do not affect the constraints ofResolved Ts。Users can filter the duplicated messages from Kafka consumers.

Replication consistency

  • MySQL sink

    • TiCDC does not split single-table transactions andensuresthe atomicity of single-table transactions.
    • TiCDC doesnot ensurethat the execution order of downstream transactions is the same as that of upstream transactions.
    • TiCDC splits cross-table transactions in the unit of table and doesnot ensure交叉表事务的原子性。
    • TiCDCensuresthat the order of single-row updates is consistent with that in the upstream.
  • Kafka sink

    • TiCDC provides different strategies for data distribution. You can distribute data to different Kafka partitions based on the table, primary key, or timestamp.
    • For different distribution strategies, the different consumer implementations can achieve different levels of consistency, including row-level consistency, eventual consistency, or cross-table transactional consistency.
    • TiCDC does not have an implementation of Kafka consumers, but only providesTiCDC Open Protocol。You can implement the Kafka consumer according to this protocol.

Restrictions

TiCDC only replicates the table that has at least onevalid index。Avalid indexis defined as follows:

  • The primary key (PRIMARY KEY) is a valid index.
  • The unique index (UNIQUE INDEX) that meets the following conditions at the same time is a valid index:
    • Every column of the index is explicitly defined as non-nullable (NOT NULL).
    • The index does not have the virtual generated column (VIRTUAL GENERATED COLUMNS).

Since v4.0.8, TiCDC supports replicating tableswithout a valid indexby modifying the task configuration. However, this compromises the guarantee of data consistency to some extent. For more details, seeReplicate tables without a valid index

Unsupported scenarios

铜rrently, The following scenarios are not supported:

  • The TiKV cluster that uses RawKV alone.
  • TheDDL operationCREATE SEQUENCEand theSEQUENCE functionin TiDB v4.0. When the upstream TiDB usesSEQUENCE, TiCDC ignoresSEQUENCEDDL operations/functions performed upstream. However, DML operations usingSEQUENCEfunctions can be correctly replicated.
  • TheTiKV Hibernate Region。TiCDC prevents the Region from entering the hibernated state.

TiCDC only provides partial support for scenarios of large transactions in the upstream. For details, refer toFAQ: Does TiCDC support replicating large transactions? Is there any risk?

Install and deploy TiCDC

You can either deploy TiCDC along with a new TiDB cluster or add the TiCDC component to an existing TiDB cluster. For details, seeDeploy TiCDC

Manage TiCDC Cluster and Replication Tasks

铜rrently, you can use thecdc clitool to manage the status of a TiCDC cluster and data replication tasks. For details, see:

Troubleshoot TiCDC

For details, refer toTroubleshoot TiCDC

TiCDC Open Protocol

TiCDC Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. TiCDC complies with TiCDC Open Protocol and replicates data changes of TiDB to third-party data medium such as MQ (Message Queue). For more information, seeTiCDC Open Protocol

Compatibility notes forsort-diranddata-dir

Thesort-dirconfiguration is used to specify the temporary file directory for the TiCDC sorter. Its functionalities might vary in different versions. The following table listssort-dir's compatibility changes across versions.

Version sort-enginefunctionality Note Recommendation
v4.0.11 or an earlier v4.0 version, v5.0.0-rc It is a changefeed configuration item and specifies temporary file directory for thefilesorter andunifiedsorter. In these versions,filesorter andunifiedsorter are实验特性andNOTrecommended for the production environment.

If multiple changefeeds use theunifiedsorter as itssort-engine, the actual temporary file directory might be thesort-dirconfiguration of any changefeed, and the directory used for each TiCDC node might be different.
It is not recommended to useunifiedsorter in the production environment.
v4.0.12, v4.0.13, v5.0.0, and v5.0.1 It is a configuration item of changefeed or ofcdc server By default, thesort-dirconfiguration of a changefeed does not take effect, and thesort-dirconfiguration ofcdc serverdefaults to/tmp/cdc_sort。只是建议configurecdc serverin the production environment.

If you use TiUP to deploy TiCDC, it is recommended to use the latest TiUP version and setsorter.sort-dirin the TiCDC server configuration.

Theunifiedsorter is enabled by default in v4.0.13, v5.0.0, and v5.0.1. If you want to upgrade your cluster to these versions, make sure that you have correctly configuredsorter.sort-dirin the TiCDC server configuration.
You need to configuresort-dirusing thecdc servercommand-line parameter (or TiUP).
v4.0.14 and later v4.0 versions, v5.0.3 and later v5.0 versions, later TiDB versions sort-diris deprecated. It is recommended to configuredata-dir You can configuredata-dirusing the latest version of TiUP. In these TiDB versions,unifiedsorter is enabled by default. Make sure thatdata-dirhas been configured correctly when you upgrade your cluster. Otherwise,/tmp/cdc_datawill be used by default as the temporary file directory.

If the storage capacity of the device where the directory is located is insufficient, the problem of insufficient hard disk space might occur. In this situation, the previoussort-dirconfiguration of changefeed will become invalid.
You need to configuredata-dirusing thecdc servercommand-line parameter (or TiUP).
Download PDF Request docs changes Ask questions on Discord
Playground
New
One-stop & interactive experience of TiDB's capabilities WITHOUT registration.
Was this page helpful?
Products
TiDB
TiDB Dedicated
TiDB Serverless
Pricing
Get Demo
Get Started
©2023PingCAP. All Rights Reserved.