Sign In Try Free

快速启动Guide for TiDB HTAP

This guide walks you through the quickest way to get started with TiDB's one-stop solution of Hybrid Transactional and Analytical Processing (HTAP).

Basic concepts

Before using TiDB HTAP, you need to have some basic knowledge aboutTiKV, a row-based storage engine for TiDB Online Transactional Processing (OLTP), andTiFlash, a columnar storage engine for TiDB Online Analytical Processing (OLAP).

  • Storage engines of HTAP: The row-based storage engine and the columnar storage engine co-exist for HTAP. Both storage engines can replicate data automatically and keep strong consistency. The row-based storage engine optimizes OLTP performance, and the columnar storage engine optimizes OLAP performance.
  • Data consistency of HTAP: As a distributed and transactional key-value database, TiKV provides transactional interfaces with ACID compliance, and guarantees data consistency between multiple replicas and high availability with the implementation of theRaft consensus algorithm. As a columnar storage extension of TiKV, TiFlash replicates data from TiKV in real time according to the Raft Learner consensus algorithm, which ensures that data is strongly consistent between TiKV and TiFlash.
  • Data isolation of HTAP: TiKV and TiFlash can be deployed on different machines as needed to solve the problem of HTAP resource isolation.
  • MPP computing engine:MPPis a distributed computing framework provided by the TiFlash engine since TiDB 5.0, which allows data exchange between nodes and provides high-performance, high-throughput SQL algorithms. In the MPP mode, the run time of the analytic queries can be significantly reduced.

Steps

In this document, you can experience the convenience and high performance of TiDB HTAP by querying an example table in aTPC-H达taset. TPC-H is a popular decision support benchmark that consists of a suite of business oriented ad-hoc queries with large volumes of data and a high degree of complexity. To experience 22 complete SQL queries using TPC-H, visittidb-bench repoorTPC-Hfor instructions on how to generate query statements and data.

Step 1. Deploy a local test environment

Before using TiDB HTAP, follow the steps in the快速启动Guide for the TiDB Database Platformto prepare a local test environment, and run the following command to deploy a TiDB cluster:


              
tiup playground

Step 2. Prepare test data

In the following steps, you can create aTPC-H达taset as the test data to use TiDB HTAP. If you are interested in TPC-H, seeGeneral Implementation Guidelines.

  1. Install the test data generation tool by running the following command:

    
                    
    tiup install bench
  2. Generate the test data by running the following command:

    
                    
    tiup bench tpch --sf=1 prepare

    If the output of this command showsFinished, it indicates that the data is created.

  3. Execute the following SQL statement to view the generated data:

    
                    
    SELECTCONCAT(table_schema,'.',table_name)AS 'Table Name', table_rowsAS 'Number of Rows', FORMAT_BYTES(data_length)AS 'Data Size', FORMAT_BYTES(index_length)AS “索引大小”, FORMAT_BYTES(data_length+index_length)AS 'Total' FROMinformation_schema.TABLESWHEREtable_schema= 'test';

    As you can see from the output, eight tables are created in total, and the largest table has 6.5 million rows (the number of rows created by the tool depends on the actual SQL query result because the data is randomly generated).

    
                    
    + ---------------+----------------+-----------+------------+-----------+ | TableName|Numberof Rows |Data Size|Index Size|Total| + ---------------+----------------+-----------+------------+-----------+ |test.nation| 25 | 2.44KiB| 0bytes| 2.44KiB| |test.region| 5 | 416bytes| 0bytes| 416bytes| |test.part| 200000 | 25.07MiB| 0bytes| 25.07MiB| |test.supplier| 10000 | 1.45MiB| 0bytes| 1.45MiB| |test.partsupp| 800000 | 120.17MiB| 12.21MiB| 132.38MiB| |test.customer| 150000 | 24.77MiB| 0bytes| 24.77MiB| |test.orders| 1527648 | 174.40MiB| 0bytes| 174.40MiB| |test.lineitem| 6491711 | 849.07MiB| 99.06MiB| 948.13MiB| + ---------------+----------------+-----------+------------+-----------+ 8 rows in set(0.06sec)

    This is a database of a commercial ordering system. In which, thetest.nationtable indicates the information about countries, thetest.regiontable indicates the information about regions, thetest.parttable indicates the information about parts, thetest.suppliertable indicates the information about suppliers, thetest.partsupptable indicates the information about parts of suppliers, thetest.customertable indicates the information about customers, thetest.customertable indicates the information about orders, and thetest.lineitemtable indicates the information about online items.

Step 3. Query data with the row-based storage engine

To know the performance of TiDB with only the row-based storage engine, execute the following SQL statements:


              
SELECTl_orderkey,SUM( l_extendedprice*(1 -l_discount) )ASrevenue, o_orderdate, o_shippriorityFROMcustomer, orders, lineitemWHEREc_mktsegment= 'BUILDING' ANDc_custkey=o_custkeyANDl_orderkey=o_orderkeyANDo_orderdate< DATE '1996-01-01' ANDl_shipdate> DATE '1996-02-01' GROUP BYl_orderkey, o_orderdate, o_shippriorityORDER BYrevenueDESC, o_orderdate limit10;

This is a shipping priority query, which provides the priority and potential revenue of the highest-revenue order that has not been shipped before a specified date. The potential revenue is defined as the sum ofl_extendedprice * (1-l_discount). The orders are listed in the descending order of revenue. In this example, this query lists the unshipped orders with potential query revenue in the top 10.

Step 4. Replicate the test data to the columnar storage engine

After TiFlash is deployed, TiKV does not replicate data to TiFlash immediately. You need to execute the following DDL statements in a MySQL client of TiDB to specify which tables need to be replicated. After that, TiDB will create the specified replicas in TiFlash accordingly.


              
ALTER TABLEtest.customerSETTIFLASH REPLICA1;ALTER TABLEtest.ordersSETTIFLASH REPLICA1;ALTER TABLEtest.lineitemSETTIFLASH REPLICA1;

To check the replication status of the specific tables, execute the following statements:


              
SELECT * FROMinformation_schema.tiflash_replicaWHERETABLE_SCHEMA= 'test' andTABLE_NAME= 'customer';SELECT * FROMinformation_schema.tiflash_replicaWHERETABLE_SCHEMA= 'test' andTABLE_NAME= 'orders';SELECT * FROMinformation_schema.tiflash_replicaWHERETABLE_SCHEMA= 'test' andTABLE_NAME= 'lineitem';

In the result of the above statements:

  • AVAILABLEindicates whether the TiFlash replica of a specific table is available or not.1means available and0means unavailable. Once theAVAILABLEfield becomes1, this status does not change anymore.
  • PROGRESSmeans the progress of the replication. The value is between 0.0 and 1.0. 1 means that the replication progress of the TiFlash replica is complete.

Step 5. Analyze data faster using HTAP

Execute the SQL statements inStep 3again, and you can see the performance of TiDB HTAP.

For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. To check whether or not a TiFlash replica is selected, you can use thedescorexplain analyzestatement. For example:


              
explain analyzeSELECTl_orderkey,SUM( l_extendedprice*(1 -l_discount) )ASrevenue, o_orderdate, o_shippriorityFROMcustomer, orders, lineitemWHEREc_mktsegment= 'BUILDING' ANDc_custkey=o_custkeyANDl_orderkey=o_orderkeyANDo_orderdate< DATE '1996-01-01' ANDl_shipdate> DATE '1996-02-01' GROUP BYl_orderkey, o_orderdate, o_shippriorityORDER BYrevenueDESC, o_orderdate limit10;

If the result of theEXPLAINstatement showsExchangeSenderandExchangeReceiveroperators, it indicates that the MPP mode has taken effect.

In addition, you can specify that each part of the entire query is computed using only the TiFlash engine. For detailed information, seeUse TiDB to read TiFlash replicas.

You can compare query results and query performance of these two methods.

What's next

Download PDF Request docs changes Ask questions on Discord
Playground
New
One-stop & interactive experience of TiDB's capabilities WITHOUT registration.
Was this page helpful?
Products
TiDB
TiDB Dedicated
TiDB Serverless
Pricing
Get Demo
Get Started
©2023PingCAP. All Rights Reserved.