Sign In Try Free

Import Data

This document describes how to import data into a TiDB cluster on Kubernetes usingTiDB Lightning.

TiDB Lightning contains two components: tidb-lightning and tikv-importer. In Kubernetes, the tikv-importer is inside the separate Helm chart of the TiDB cluster. And tikv-importer is deployed as aStatefulSetwithreplicas=1while tidb-lightning is in a separate Helm chart and deployed as aJob.

TiDB Lightning supports three backends:Importer-backend,Local-backend, andTiDB-backend. For the differences of these backends and how to choose backends, seeTiDB Lightning Backends.

  • ForImporter-backend, both tikv-importer and tidb-lightning need to be deployed.

  • ForLocal-backend, only tidb-lightning needs to be deployed.

  • ForTiDB-backend, only tidb-lightning needs to be deployed, and it is recommended to import data using CustomResourceDefinition (CRD) in TiDB Operator v1.1 and later versions. For details, refer toRestore Data from GCS Using TiDB LightningorRestore Data from S3-Compatible Storage Using TiDB Lightning

Deploy TiDB Lightning

Step 1. Configure TiDB Lightning

Use the following command to save the default configuration of TiDB Lightning to thetidb-lightning-values.yamlfile:


              
helm inspect values pingcap/tidb-lightning --version=${chart_version}> tidb-lightning-values.yaml

Configure thebackendfield in the configuration file depending on your needs. The optional values arelocalandtidb.


              
# The delivery backend used to import data (valid options include `local` and `tidb`). # If set to `local`, then the following `sortedKV` should be set. backend: local

If you use thelocalbackend, you must setsortedKVinvalues.yamlto create the corresponding PVC. The PVC is used for local KV sorting.


              
# For `local` backend, an extra PV is needed for local KV sorting. sortedKV: storageClassName: local-storage storage: 100Gi

Configure checkpoint

Starting from v1.1.10, the tidb-lightning Helm chart saves theTiDB Lightning checkpoint informationin the directory of the source data. When the a new tidb-lightning job is running, it can resume the data import according to the checkpoint information.

For versions earlier than v1.1.10, you can modifyconfiginvalues.yamlto save the checkpoint information in the target TiDB cluster, other MySQL-compatible databases or a shared storage directory. For more information, refer toTiDB Lightning checkpoint.

Configure TLS

If TLS between components has been enabled on the target TiDB cluster (spec.tlsCluster.enabled: true), refer toGenerate certificates for components of the TiDB clusterto genereate a server-side certificate for TiDB Lightning, and configuretlsCluster.enabled: trueinvalues.yamlto enable TLS between components.

If the target TiDB cluster has enabled TLS for the MySQL client (spec.tidb.tlsClient.enabled: true), and the corresponding client-side certificate is configured (the Kubernetes Secret object is${cluster_name}-tidb-client-secret), you can configuretlsClient.enabled: trueinvalues.yamlto enable TiDB Lightning to connect to the TiDB server using TLS.

To use different client certificates to connect to the TiDB server, refer toIssue two sets of certificates for the TiDB clusterto generate the client-side certificate for TiDB Lightning, and configure the corresponding Kubernetes secret object intlsCluster.tlsClientSecretNameinvalues.yaml.

Step 2. Configure the data source

The tidb-lightning Helm chart supports both local and remote data sources. The three types of data sources correspond to three modes: local, remote, and ad hoc. The three modes cannot be used together. You can only configure one mode.

Local

In the local mode, tidb-lightning reads the backup data from a directory in one of the Kubernetes node.


              
dataSource: local: nodeName: kind-worker3 hostPath: /data/export-20190820

The descriptions of the related fields are as follows:

  • dataSource.local.nodeName: the node name that the directory is located at.
  • dataSource.local.hostPath: the path of the backup data. The path must contain a file named元数据.

Remote

Unlike the local mode, the remote mode usesrcloneto download the backup tarball file or the backup directory from a network storage to a PV. Any cloud storage supported by rclone should work, but currently only the following have been tested:Google Cloud Storage (GCS),Amazon S3,Ceph对象存储.

To restore backup data from the remote source, take the following steps:

  1. Grant permissions to the remote storage.

    If you use Amazon S3 as the storage, refer toAWS account Permissions. The configuration varies with different methods.

    If you use Ceph as the storage, you can only grant permissions by importing AccessKey and SecretKey. SeeGrant permissions by AccessKey and SecretKey.

    If you use GCS as the storage, refer toGCS account permissions.

    • Grant permissions by importing AccessKey and SecretKey

      1. Create aSecretconfiguration filesecret.yamlcontaining the rclone configuration. A sample configuration is listed below. Only one cloud storage configuration is required.

        
                            
        apiVersion: v1 kind: Secret 元数据: name: cloud-storage-secret type: Opaque stringData: rclone.conf: | [s3] type = s3 provider = AWS env_auth = false access_key_id = ${access_key} secret_access_key = ${secret_key} region = us-east-1[ceph]type = s3 provider = Ceph env_auth = false access_key_id = ${access_key} secret_access_key = ${secret_key} endpoint = ${endpoint} region = :default-placement[gcs]type = google cloud storage # The service account must include Storage Object Viewer role # The content can be retrieved by `cat ${service-account-file} | jq -c .` service_account_credentials = ${service_account_json_file_content}
      2. Execute the following command to createSecret:

        
                            
        kubectl apply -f secret.yaml -n${namespace}
    • Grant permissions by associating IAM with Pod or with ServiceAccount

      If you use Amazon S3 as the storage, you can grant permissions by associating IAM with Pod or with ServiceAccount, in whichs3.access_key_idands3.secret_access_keycan be ignored.

      1. Save the following configurations assecret.yaml.

        
                            
        apiVersion: v1 kind: Secret 元数据: name: cloud-storage-secret type: Opaque stringData: rclone.conf: | (s3)类型= s3提供者= AWS env_auth =真正的交流cess_key_id = secret_access_key = region = us-east-1
      2. Execute the following command to createSecret:

        
                            
        kubectl apply -f secret.yaml -n${namespace}
  2. Configure thedataSourcefield. For example:

    
                    
    dataSource: remote: rcloneImage: rclone/rclone:1.55.1 storageClassName: local-storage storage: 100Gi secretName: cloud-storage-secret path: s3:bench-data-us/sysbench/sbtest_16_1e7.tar.gz # directory: s3:bench-data-us

    The descriptions of the related fields are as follows:

    • dataSource.remote.storageClassName: the name of StorageClass used to create PV.
    • dataSource.remote.secretName: the name of the Secret created in the previous step.
    • dataSource.remote.path: If the backup data is packaged as a tarball file, use this field to indicate the path to the tarball file.
    • dataSource.remote.directory: If the backup data is in a directory, use this field to specify the path to the directory.

Ad hoc

When restoring data from remote storage, sometimes the restore process is interrupted due to the exception. In such cases, if you do not want to download backup data from the network storage repeatedly, you can use the ad hoc mode to directly recover the data that has been downloaded and decompressed into PV in the remote mode.

For example:


              
dataSource: adhoc: pvcName: tidb-cluster-scheduled-backup backupName: scheduled-backup-20190822-041004

The descriptions of the related fields are as follows:

  • dataSource.adhoc.pvcName: the PVC name used in restoring data from remote storage. The PVC must be deployed in the same namespace as Tidb-Lightning.
  • dataSource.adhoc.backupName: the name of the original backup data, such as:backup-2020-12-17T10:12:51Z(Does not contain the '. tgz' suffix of the compressed file name on network storage).

Step 3. Deploy TiDB Lightning

The method of deploying TiDB Lightning varies with different methods of granting permissions and with different storages.

  • ForLocal Mode,Ad hoc Mode, andRemote Mode(only for remote modes that meet one of the three requirements: using Amazon S3 AccessKey and SecretKey permission granting methods, using Ceph as the storage backend, or using GCS as the storage backend), run the following command to deploy TiDB Lightning.

    
                    
    helm install${release_name}pingcap/tidb-lightning --namespace=${namespace}--setfailFast=true-f tidb-lightning-values.yaml --version=${chart_version}
  • ForRemote Mode, if you grant permissions by associating Amazon S3 IAM with Pod, take the following steps:

    1. Create the IAM role:

      Create an IAM rolefor the account, andgrant the required permissionto the role. The IAM role requires theAmazonS3FullAccesspermission because TiDB Lightning needs to access Amazon S3 storage.

    2. Modifytidb-lightning-values.yaml, and add theiam.amazonaws.com/role: arn:aws:iam::123456789012:role/userannotation in theannotationsfield.

    3. Deploy TiDB Lightning:

      
                        
      helm install${release_name}pingcap/tidb-lightning --namespace=${namespace}--setfailFast=true-f tidb-lightning-values.yaml --version=${chart_version}
  • ForRemote Mode, if you grant permissions by associating Amazon S3 with ServiceAccount, take the following steps:

    1. Enable the IAM role for the service account on the cluster:

      To enable the IAM role permission on the EKS cluster, seeAWS Documentation.

    2. Create the IAM role:

      Create an IAM role. Grant theAmazonS3FullAccesspermission to the role, and editTrust relationshipsof the role.

    3. Associate IAM with the ServiceAccount resources:

      
                        
      kubectl annotate sa${servieaccount}-n${namespace}eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/user
    4. Deploy TiDB Lightning:

      
                        
      helm install${release_name}pingcap/tidb-lightning --namespace=${namespace}--set-string failFast=true,serviceAccount=${servieaccount}-f tidb-lightning-values.yaml --version=${chart_version}

Destroy TiDB Lightning

Currently, TiDB Lightning only supports restoring data offline. After the restore, if the TiDB cluster needs to provide service for external applications, you can destroy TiDB Lightning to save cost.

To destroy tidb-lightning, execute the following command:


              
helm uninstall${release_name}-n${namespace}

Troubleshoot TiDB Lightning

When TiDB Lightning fails to restore data, you cannot simply restart it.手动干预is required. Therefore, the TiDB Lightning'sJobrestart policy is set toNever.

If TiDB Lightning fails to restore data, and if you have configured to persist the checkpoint information in the target TiDB cluster, other MySQL-compatible databases or a shared storage directory, follow the steps below to do manual intervention:

  1. 查看日志通过执行以下command:

    
                    
    kubectl logs -n${namespace} ${pod_name}
    • If you restore data using the remote data source, and the error occurs when TiDB Lightning downloads data from remote storage:

      1. Address the problem according to the log.
      2. Deploy tidb-lightning again and retry the data restore.
    • For other cases, refer to the following steps.

  2. Refer toTiDB Lightning Troubleshootingand learn the solutions to different issues.

  3. Address the issues accordingly:

    • Iftidb-lightning-ctlis required:

      1. ConfiguredataSourceinvalues.yaml. Make sure the newJobuses the data source and checkpoint information of the failedJob:

        • In the local or ad hoc mode, you do not need to modifydataSource.
        • In the remote mode, modifydataSourceto the ad hoc mode.dataSource.adhoc.pvcNameis the PVC name created by the original Helm chart.dataSource.adhoc.backupNameis the backup name of the data to be restored.
      2. ModifyfailFastinvalues.yamltofalse, and create aJobused fortidb-lightning-ctl.

        • Based on the checkpoint information, TiDB Lightning checks whether the last data restore encountered an error. If yes, TiDB Lightning pauses the restore automatically.
        • TiDB Lightning uses the checkpoint information to avoid repeatedly restoring the same data. Therefore, creating theJobdoes not affect data correctness.
      3. After the Pod corresponding to the newJobis running, view the log by runningkubectl logs -n ${namespace} ${pod_name}and confirm tidb-lightning in the newJobalready stops data restore. If the log has the following message, the data restore is stopped:

        • tidb lightning encountered error
        • tidb lightning exit
      4. Enter the container by runningkubectl exec -it -n ${namespace} ${pod_name} -it -- sh.

      5. Obtain the starting script by runningcat /proc/1/cmdline.

      6. Get the command-line parameters from the starting script. Refer toTiDB Lightning Troubleshootingand troubleshoot usingtidb-lightning-ctl.

      7. After the troubleshooting, modifyfailFastinvalues.yamltotrueand create a newJobto resume data restore.

    • Iftidb-lightning-ctlis not required:

      1. Troubleshoot TiDB Lightning.

      2. ConfiguredataSourceinvalues.yaml. Make sure the newJobuses the data source and checkpoint information of the failedJob:

        • In the local or ad hoc mode, you do not need to modifydataSource.
        • In the remote mode, modifydataSourceto the ad hoc mode.dataSource.adhoc.pvcNameis the PVC name created by the original Helm chart.dataSource.adhoc.backupNameis the backup name of the data to be restored.
      3. Create a newJobusing the modifiedvalues.yamlfile and resume data restore.

  4. After the troubleshooting and data restore is completed,delete theJobsfor data restore and troubleshooting.

Download PDF Request docs changes Ask questions on Discord
Playground
One-stop & interactive experience of TiDB's capabilities WITHOUT registration.
Was this page helpful?
Products
TiDB
TiDB Dedicated
TiDB Serverless
Pricing
Get Demo
Get Started
©2023PingCAP. All Rights Reserved.