hadoop/hadoop-ozone/docs/content/Hdds.md
Anu Engineer 8af8453589 HDDS-435. Enhance the existing ozone documentation.
Contributed by Elek, Marton.
2018-09-17 10:46:28 -07:00

50 lines
2.0 KiB
Markdown

---
title: "Hadoop Distributed Data Store"
date: "2017-09-14"
menu:
main:
parent: Architecture
weight: 10
---
SCM Overview
------------
Storage Container Manager or SCM is a very important component of ozone. SCM
offers block and container-based services to Ozone Manager. A container is a
collection of unrelated blocks under ozone. SCM and data nodes work together
to maintain the replication levels needed by the cluster.
It is easier to look at a putKey operation to understand the role that SCM plays.
To put a key, a client makes a call to KSM with the following arguments.
-- putKey(keyName, data, pipeline type, replication count)
1. keyName - refers to the file name.
2. data - The data that the client wants to write.
3. pipeline type - Allows the client to select the pipeline type. A pipeline
refers to the replication strategy used for replicating a block. Ozone
currently supports Stand Alone and Ratis as two different pipeline types.
4. replication count - This specifies how many copies of the block replica should be maintained.
In most cases, the client does not specify the pipeline type and replication
count. The default pipeline type and replication count are used.
Ozone Manager when it receives the putKey call, makes a call to SCM asking
for a pipeline instance with the specified property. So if the client asked
for RATIS replication strategy and a replication count of three, then OM
requests SCM to return a set of data nodes that meet this capability.
If SCM can find this a pipeline ( that is a set of data nodes) that can meet
the requirement from the client, then those nodes are returned to OM. OM will
persist this info and return a tuple consisting of {BlockID, ContainerName, and Pipeline}.
If SCM is not able to find a pipeline, then SCM creates a logical pipeline and then returns it.
SCM manages blocks, containers, and pipelines. To return healthy pipelines,
SCM also needs to understand the node health. So SCM listens to heartbeats
from data nodes and acts as the node manager too.