hadoop_book/yarn/jobhistory_cache.md
zeekling 3d1a30e0de 增加jobhitory缓存详解 (#27)
Reviewed-on: #27
Co-authored-by: zeekling <lingzhaohui@zeekling.cn>
Co-committed-by: zeekling <lingzhaohui@zeekling.cn>
2024-05-26 14:51:59 +00:00

76 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## jobhistory 作业缓存
jobhistory 一般会保存一部分作业信息到内存中,查询作业信息的时候一般会从内存查询,如果内存查询不到就会从磁盘上扫描。
jobhistory 缓存一般分为两层第一层是guava缓存默认情况下guava的缓存个数是5可以通过配置项`mapreduce.jobhistory.loadedjobs.cache.size`控制。
当guava的一级缓存中不存在的时候默认是需要重新加载的jobhistory中定义了加载规则,定义代码如下:
```java
CacheLoader<JobId, Job> loader;
loader = new CacheLoader<JobId, Job>() {
@Override
public Job load(JobId key) throws Exception {
return loadJob(key);
}
};
```
其中loadJob实现如下其中hsManager为加载具体实现
```java
private Job loadJob(JobId jobId) throws RuntimeException, IOException {
if (LOG.isDebugEnabled()) {
LOG.debug("Looking for Job " + jobId);
}
HistoryFileInfo fileInfo;
fileInfo = hsManager.getFileInfo(jobId);
if (fileInfo == null) {
throw new HSFileRuntimeException("Unable to find job " + jobId);
}
fileInfo.waitUntilMoved();
if (fileInfo.isDeleted()) {
throw new HSFileRuntimeException("Cannot load deleted job " + jobId);
} else {
return fileInfo.loadJob();
}
}
```
hsManager中定义了jobhistory的二级缓存jobListCachejobListCache的大小可以通过配置项`mapreduce.jobhistory.joblist.cache.size`控制。
默认可以保存20000个。当然缓存超时指定时间可会被清理具体可以有配置项`mapreduce.jobhistory.max-age-ms`控制默认为1周。
查找的顺序为:
- 优先从内存查找二级缓存为jobListCache。
- 如果缓存找不见优先扫描刚完成的作业所在的目录会刷新jobListCache缓存由配置项mapreduce.jobhistory.intermediate-done-dir控制。
- 如果还是找不见从已经完成的作业的目录扫描具体目录由配置项mapreduce.jobhistory.done-dir控制。
```java
public HistoryFileInfo getFileInfo(JobId jobId) throws IOException {
// 优先从内存查找(二级缓存)
HistoryFileInfo fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
return fileInfo;
}
// 如果缓存找不见优先扫描刚完成的作业所在的目录由配置项mapreduce.jobhistory.intermediate-done-dir控制
scanIntermediateDirectory();
fileInfo = jobListCache.get(jobId);
if (fileInfo != null) {
return fileInfo;
}
// 如果还是找不见从已经完成的作业的目录扫描具体目录由配置项mapreduce.jobhistory.done-dir控制
fileInfo = scanOldDirsForJob(jobId);
if (fileInfo != null) {
return fileInfo;
}
return null;
}
```