From 234d80a41e441d23f3b0d86570e3fec50b4f8539 Mon Sep 17 00:00:00 2001 From: zeekling Date: Sun, 27 Aug 2023 23:00:44 +0800 Subject: [PATCH] =?UTF-8?q?checkpoint=E7=9B=B8=E5=85=B3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- basic/checkpoint.md | 45 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 6 deletions(-) diff --git a/basic/checkpoint.md b/basic/checkpoint.md index 755ef35..375cbdb 100644 --- a/basic/checkpoint.md +++ b/basic/checkpoint.md @@ -1,13 +1,46 @@ -# checkpoint 配置 + +# 状态后端和checkpoint + +- 状态后端是保存到本地的状态。 +- checkpoint是将状态定时备份到第三方存储,比如hdfs,obs上面,方便在作业重新运行的时候恢复数据。 + +![pic](https://pan.zeekling.cn//flink/basic/flink_backent_0001.png) + + +# 状态后端 | 配置名称 | 默认值 | 说明 | |---|---|---| -| execution.checkpointing.interval | - | checkpoint的触发的时间,每个一段时间都会触发checkpoint。建议一般配置为1-10min左右 | -| execution.checkpointing.mode | EXACTLY_ONCE | EXACTLY_ONCE:保证精确一次;
AT_LEAST_ONCE:至少一次。建议EXACTLY_ONCE | -| execution.checkpointing.timeout | 10min| checkpoint的超时时间,建议设置长一点,30min左右 | -| execution.checkpointing.unaligned.enabled | false | 是否启用非对齐checkpoint,建议不开启 | -| | | | +| **state.backend** | - | 建议配置为rocksdb | +| state.backend.latency-track.keyed-state-enabled | false | 是否跟踪keyed state operations的延时,建议不要开启 | +| state.backend.latency-track.sample-interval | 100 | 跟踪耗时超过100ms的operations | +| state.backend.latency-track.history-size | 128 | 跟踪耗时较高operation的个数 | +| table.exec.state.ttl | - | 状态后端ttl时间,一般用于join场景下,防止状态后端过大导致作业失败 | + + +# checkpoint 常用配置 + +| 配置名称 | 默认值 | 说明 | +|---|---|---| +| **execution.checkpointing.interval** | - | checkpoint的触发的时间,每个一段时间都会触发checkpoint。建议一般配置为1-10min左右 | +| **execution.checkpointing.mode** | EXACTLY_ONCE | EXACTLY_ONCE:保证精确一次;
AT_LEAST_ONCE:至少一次。建议EXACTLY_ONCE | +| **state.backend.incremental** | false | 是否开启增量checkpoint,建议开启 | +| **execution.checkpointing.timeout** | 10min| checkpoint的超时时间,建议设置长一点,30min左右 | +| **execution.checkpointing.unaligned.enabled** | false | 是否启用非对齐checkpoint,建议不开启 | +| execution.checkpointing.unaligned.forced | false | 是否强制开启非对齐checkpoint | +| execution.checkpointing.max-concurrent-checkpoints | 1 | 同时进行checkpoint的最大次数 | +| execution.checkpointing.min-pause | 0 | 两个checkpoint之间的最小停顿时间 | +| execution.checkpointing.tolerable-failed-checkpoints | - | 可容忍的checkpoint的连续故障数目 | +| execution.checkpointing.aligned-checkpoint-timeout | 0 | 对齐checkpoint超时时间 | +| execution.checkpointing.alignment-timeout | 0 | 参考:execution.checkpointing.aligned-checkpoint-timeout (已经过期) | +| execution.checkpointing.force | false | 是否强制检查点(已经过期) | +| state.checkpoints.num-retained | 1 | checkpoint 保存个数 | +| state.backend.async | true | 是否开启异步checkpoint (已经过期) | +| state.savepoints.dir | - | savepoints存储文件夹 | +| state.checkpoints.dir | - | checkpoint存储文件夹 | +| state.storage.fs.memory-threshold | 20kb | 状态文件的最小大小 | +| state.storage.fs.write-buffer-size | 4 * 1024 | 写入文件系统的检查点流的写入缓冲区的默认大小。 |