Sign In Try Free

Use TiFlash

After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated.

You can either use TiDB to read TiFlash replicas for medium-scale analytical processing, or use TiSpark to read TiFlash replicas for large-scale analytical processing, which is based on your own needs. See the following sections for details:

Create TiFlash replicas for tables

After TiFlash is connected to the TiKV cluster, data replication by default does not begin. You can send a DDL statement to TiDB through a MySQL client to create a TiFlash replica for a specific table:


              
ALTER TABLEtable_nameSETTIFLASH REPLICA count;

The parameter of the above command is described as follows:

  • countindicates the number of replicas. When the value is0, the replica is deleted.

If you execute multiple DDL statements on the same table, only the last statement is ensured to take effect. In the following example, two DDL statements are executed on the tabletpch50, but only the second statement (to delete the replica) takes effect.

Create two replicas for the table:


              
ALTER TABLE`tpch50`.`lineitem`SETTIFLASH REPLICA2;

Delete the replica:


              
ALTER TABLE`tpch50`.`lineitem`SETTIFLASH REPLICA0;

Notes:

  • If the tabletis replicated to TiFlash through the above DDL statements, the table created using the following statement will also be automatically replicated to TiFlash:

    
                    
    CREATE TABLEtable_nameliket;
  • For versions earlier than v4.0.6, if you create the TiFlash replica before using TiDB Lightning to import the data, the data import will fail. You must import data to the table before creating the TiFlash replica for the table.

  • If TiDB and TiDB Lightning are both v4.0.6 or later, no matter a table has TiFlash replica(s) or not, you can import data to that table using TiDB Lightning. Note that this might slow the TiDB Lightning procedure, which depends on the NIC bandwidth on the TiDB Lightning host, the CPU and disk load of the TiFlash node, and the number of TiFlash replicas.

  • It is recommended that you do not replicate more than 1,000 tables because this lowers the PD scheduling performance. This limit will be removed in later versions.

检查复制进度

You can check the status of the TiFlash replicas of a specific table using the following statement. The table is specified using theWHEREclause. If you remove theWHEREclause, you will check the replica status of all tables.


              
SELECT * FROMinformation_schema.tiflash_replicaWHERETABLE_SCHEMA= '' andTABLE_NAME= '';

In the result of above statement:

  • AVAILABLEindicates whether the TiFlash replicas of this table are available or not.1means available and0means unavailable. Once the replicas become available, this status does not change. If you use DDL statements to modify the number of replicas, the replication status will be recalculated.
  • PROGRESSmeans the progress of the replication. The value is between0.0and1.0.1means at least one replica is replicated.

Use TiDB to read TiFlash replicas

TiDB provides three ways to read TiFlash replicas. If you have added a TiFlash replica without any engine configuration, the CBO (cost-based optimization) mode is used by default.

Smart selection

For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. You can use thedescorexplain analyzestatement to check whether or not a TiFlash replica is selected. For example:


              
desc select count(*)fromtest.t;

              
+--------------------------+---------+--------------+---------------+--------------------------------+ | id | estRows | task | access object | operator info | +--------------------------+---------+--------------+---------------+--------------------------------+ | StreamAgg_9 | 1.00 | root | | funcs:count(1)->Column#4 | | └─TableReader_17 | 1.00 | root | | data:TableFullScan_16 | | └─TableFullScan_16 | 1.00 | cop[tiflash] | table:t | keep order:false, stats:pseudo | +--------------------------+---------+--------------+---------------+--------------------------------+ 3 rows in set (0.00 sec)

              
explain analyzeselect count(*)fromtest.t;

              
+--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+ | StreamAgg_9 | 1.00 | 1 | root | | time:83.8372ms, loops:2 | funcs:count(1)->Column#4 | 372 Bytes | N/A | | └─TableReader_17 | 1.00 | 1 | root | | time:83.7776ms, loops:2, rpc num: 1, rpc time:83.5701ms, proc keys:0 | data:TableFullScan_16 | 152 Bytes | N/A | | └─TableFullScan_16 | 1.00 | 1 | cop[tiflash] | table:t | time:43ms, loops:1 | keep order:false, stats:pseudo | N/A | N/A | +--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+

cop[tiflash]means that the task will be sent to TiFlash for processing. If you have not selected a TiFlash replica, you can try to update the statistics using theanalyze tablestatement, and then check the result using theexplain analyzestatement.

Note that if a table has only a single TiFlash replica and the related node cannot provide service, queries in the CBO mode will repeatedly retry. In this situation, you need to specify the engine or use the manual hint to read data from the TiKV replica.

Engine isolation

Engine isolation is to specify that all queries use a replica of the specified engine by configuring the corresponding variable. The optional engines are "tikv", "tidb" (indicates the internal memory table area of TiDB, which stores some TiDB system tables and cannot be actively used by users), and "tiflash", with the following two configuration levels:

  • TiDB instance-level, namely, INSTANCE level. Add the following configuration item in the TiDB configuration file:

    
                    
    [隔离-read] engines = ["tikv", "tidb", "tiflash"]

    The INSTANCE-level default configuration is["tikv", "tidb", "tiflash"].

  • SESSION level. Use the following statement to configure:

    
                    
    set@@session.tidb_isolation_read_engines="engine list separated by commas";

    or

    
                    
    setSESSION tidb_isolation_read_engines="engine list separated by commas";

    The default configuration of the SESSION level inherits from the configuration of the TiDB INSTANCE level.

The final engine configuration is the session-level configuration, that is, the session-level configuration overrides the instance-level configuration. For example, if you have configured "tikv" in the INSTANCE level and "tiflash" in the SESSION level, then the TiFlash replicas are read. If the final engine configuration is "tikv" and "tiflash", then the TiKV and TiFlash replicas are both read, and the optimizer automatically selects a better engine to execute.

If the queried table does not have a replica of the specified engine (for example, the engine is configured as "tiflash" but the table does not have a TiFlash replica), the query returns an error.

Manual hint

Manual hint can force TiDB to use specified replicas for specific table(s) on the premise of satisfying engine isolation. Here is an example of using the manual hint:


              
select /*+ read_from_storage(tiflash[table_name]) */...fromtable_name;

If you set an alias to a table in a query statement, you must use the alias in the statement that includes a hint for the hint to take effect. For example:


              
select /*+ read_from_storage(tiflash[alias_a,alias_b]) */...fromtable_name_1asalias_a, table_name_2asalias_bwherealias_a.column_1=alias_b.column_2;

In the above statements,tiflash[]prompts the optimizer to read the TiFlash replicas. You can also usetikv[]to prompt the optimizer to read the TiKV replicas as needed. For hint syntax details, refer toREAD_FROM_STORAGE.

If the table specified by a hint does not have a replica of the specified engine, the hint is ignored and a warning is reported. In addition, a hint only takes effect on the premise of engine isolation. If the engine specified in a hint is not in the engine isolation list, the hint is also ignored and a warning is reported.

The relationship of smart selection, engine isolation, and manual hint

In the above three ways of reading TiFlash replicas, engine isolation specifies the overall range of available replicas of engines; within this range, manual hint provides statement-level and table-level engine selection that is more fine-grained; finally, CBO makes the decision and selects a replica of an engine based on cost estimation within the specified engine list.

Use TiSpark to read TiFlash replicas

Currently, you can use TiSpark to read TiFlash replicas in a method similar to the engine isolation in TiDB. This method is to configure thespark.tispark.isolation_read_enginesparameter. The parameter value defaults totikv,tiflash, which means that TiDB reads data from TiFlash or from TiKV according to CBO's selection. If you set the parameter value totiflash, it means that TiDB forcibly reads data from TiFlash.

Notes

When this parameter is set totiflash, only the TiFlash replicas of all tables involved in the query are read and these tables must have TiFlash replicas; for tables that do not have TiFlash replicas, an error is reported. When this parameter is set totikv, only the TiKV replica is read.

You can configure this parameter in one of the following ways:

  • Add the following item in thespark-defaults.conffile:

    
                    
    spark.tispark.isolation_read_engines tiflash
  • Add--conf spark.tispark.isolation_read_engines=tiflashin the initialization command when initializing Spark shell or Thrift server.

  • Setspark.conf.set("spark.tispark.isolation_read_engines", "tiflash")in Spark shell in a real-time manner.

  • Set设置spark.tispark.isolation_read_engines = tiflashin Thrift server after the server is connected via beeline.

Supported push-down calculations

TiFlashsupports predicate, aggregate push-down calculations, and table joins. Push-down calculations can help TiDB perform distributed acceleration. Currently,Full Outer JoinandDISTINCT COUNTare not the supported calculation types, which will be optimized in later versions.

You can enable the push-down ofjoin使用下面的会话变量(Full Outer Joinis currently not supported):


              
set @@session.tidb_opt_broadcast_join=1

Currently, TiFlash supports pushing down a limited number of expressions, including:


              
+, -, /, *, >=, <=, =, !=, <, >, ifnull, isnull, bitor, in, bitand, or, and, like, not, case when, month, substr, timestampdiff, date_format, from_unixtime, json_length, if, bitneg, bitxor, round without fraction, cast(int as decimal), min, max, sum, count, avg, approx_count_distinct

TiFlashdoes not support push-down calculations in the following situations:

  • Expressions that contain theTimetype cannot be pushed down.
  • If an aggregate function or aWHEREclause contains expressions that are not included in the list above, the aggregate or related predicate filtering cannot be pushed down.

If a query encounters unsupported push-down calculations, TiDB needs to complete the remaining calculations, which might greatly affect the TiFlash acceleration effect.

Download PDF Request docs changes Ask questions on Discord
Playground
New
One-stop & interactive experience of TiDB's capabilities WITHOUT registration.
Was this page helpful?
Products
TiDB
TiDB Dedicated
TiDB Serverless
Pricing
Get Demo
Get Started
©2023PingCAP. All Rights Reserved.