diff --git a/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html b/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html index 6773103ef9..0f52e1ae49 100644 --- a/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html +++ b/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html @@ -15,11 +15,11 @@
If an RM admin command is issued using CLI, I get something like following: - -13/07/24 17:19:40 INFO client.RMProxy: Connecting to ResourceManager at xxxx.com/1.2.3.4:1234 -refreshQueues: Unknown protocol: org.apache.hadoop.yarn.api.ResourceManagerAdministrationProtocolPB - +If an RM admin command is issued using CLI, I get something like following: + +13/07/24 17:19:40 INFO client.RMProxy: Connecting to ResourceManager at xxxx.com/1.2.3.4:1234 +refreshQueues: Unknown protocol: org.apache.hadoop.yarn.api.ResourceManagerAdministrationProtocolPB +
Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945. - +Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945. + Making it a blocker until full impact of the issue is scoped.
509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]] -510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] -511 at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531) -512 at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482) -513 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788) -514 at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587) -515 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562) +509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]] +510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] +511 at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531) +512 at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482) +513 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788) +514 at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587) +515 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562)
AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses. - +AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses. + The client libraries could expose both the single and multi-container requests.
Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. - +Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. + There is no straight forward way to change it in script. Just updating the variables with defaults.
The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. +The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources.
In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. - -Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. - -We have following execution sequence in Shell: -1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. -2) In main thread, open a buffered reader and feed in the process's standard input stream. -3) When timeout happens, the timer task will call {{Process#destroy()}} - to kill the main process. - -On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread. - -On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. - - +In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. + +Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. + +We have following execution sequence in Shell: +1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. +2) In main thread, open a buffered reader and feed in the process's standard input stream. +3) When timeout happens, the timer task will call {{Process#destroy()}} + to kill the main process. + +On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread. + +On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. + +
https://builds.apache.org/job/Hadoop-Yarn-trunk/246/ - -{code:xml} -Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager -Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec <<< FAILURE! -testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager) Time elapsed: 286 sec <<< FAILURE! -junit.framework.ComparisonFailure: expected:<[asf009.sp2.ygridcore.ne]t> but was:<[localhos]t> - at junit.framework.Assert.assertEquals(Assert.java:85) - +https://builds.apache.org/job/Hadoop-Yarn-trunk/246/ + +{code:xml} +Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager +Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec <<< FAILURE! +testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager) Time elapsed: 286 sec <<< FAILURE! +junit.framework.ComparisonFailure: expected:<[asf009.sp2.ygridcore.ne]t> but was:<[localhos]t> + at junit.framework.Assert.assertEquals(Assert.java:85) + {code}
App submission on secure cluster fails with the following exception: - -{noformat} -INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0 -main : user is qa_user -javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.] - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) - at java.lang.reflect.Constructor.newInstance(Constructor.java:513) - at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) - at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) - at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348) -Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. - at org.apache.hadoop.ipc.Client.call(Client.java:1298) - at org.apache.hadoop.ipc.Client.call(Client.java:1250) - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204) - at $Proxy7.heartbeat(Unknown Source) - at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) - ... 3 more - -.Failing this attempt.. Failing the application. - +App submission on secure cluster fails with the following exception: + +{noformat} +INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0 +main : user is qa_user +javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.] + at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) + at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) + at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) + at java.lang.reflect.Constructor.newInstance(Constructor.java:513) + at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) + at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) + at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348) +Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. + at org.apache.hadoop.ipc.Client.call(Client.java:1298) + at org.apache.hadoop.ipc.Client.call(Client.java:1250) + at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204) + at $Proxy7.heartbeat(Unknown Source) + at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) + ... 3 more + +.Failing this attempt.. Failing the application. + {noformat}
If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. - +If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. + Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations.
the following stack trace is generated in rm - -{code} -n, service: 68.142.246.147:45454 }, ] resource=<memory:1536, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:44544, vCores:29>usedCapacity=0.90625, absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=<memory:44544, vCores:29> cluster=<memory:49152, vCores:48> -2013-06-17 12:43:53,655 INFO capacity.ParentQueue (ParentQueue.java:completedContainer(696)) - completedContainer queue=root usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=<memory:44544, vCores:29> cluster=<memory:49152, vCores:48> -2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED -2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454 -2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation <memory:6144, vCores:4> -2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate... -2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler -java.lang.NullPointerException - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) - at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) - at java.lang.Thread.run(Thread.java:662) -2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye.. -2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088 -2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted -2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system... -2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. -2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete. -2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. -2013-06-17 12:43:53,768 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8033 -2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8033 -2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8032 -2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder -2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8032 -2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder -2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8030 -2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8030 -2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8031 -2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder -2013-06-17 12:43:53,774 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8031 -2013-06-17 12:43:53,775 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder +the following stack trace is generated in rm + +{code} +n, service: 68.142.246.147:45454 }, ] resource=<memory:1536, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:44544, vCores:29>usedCapacity=0.90625, absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=<memory:44544, vCores:29> cluster=<memory:49152, vCores:48> +2013-06-17 12:43:53,655 INFO capacity.ParentQueue (ParentQueue.java:completedContainer(696)) - completedContainer queue=root usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=<memory:44544, vCores:29> cluster=<memory:49152, vCores:48> +2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED +2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454 +2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation <memory:6144, vCores:4> +2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate... +2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler +java.lang.NullPointerException + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) + at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) + at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) + at java.lang.Thread.run(Thread.java:662) +2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye.. +2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088 +2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted +2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system... +2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. +2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete. +2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. +2013-06-17 12:43:53,768 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8033 +2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8033 +2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8032 +2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder +2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8032 +2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder +2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8030 +2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8030 +2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8031 +2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder +2013-06-17 12:43:53,774 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8031 +2013-06-17 12:43:53,775 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder {code}
The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails. - -Exception in trunk: -{noformat} -Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch -Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec <<< FAILURE! -testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 1307 sec <<< ERROR! -java.lang.NullPointerException - at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) - at java.lang.reflect.Method.invoke(Method.java:597) - at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) - at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) - at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) - at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) - at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) +The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails. + +Exception in trunk: +{noformat} +Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch +Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec <<< FAILURE! +testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 1307 sec <<< ERROR! +java.lang.NullPointerException + at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) + at java.lang.reflect.Method.invoke(Method.java:597) + at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) + at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) + at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) + at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) + at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) {noformat}
Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private. +Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private. The purpose is not to expose impl
The container's launch script sets up environment variables, symlinks etc. - -If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. - +The container's launch script sets up environment variables, symlinks etc. + +If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. + To reproduce, set an env var where the value contains characters that throw syntax errors in bash.
RM app summary logs have been enabled as per the default config: - -{noformat} -# -# Yarn ResourceManager Application Summary Log -# -# Set the ResourceManager summary log filename -yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log -# Set the ResourceManager summary log level and appender -yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY - -# Appender for ResourceManager Application Summary Log -# Requires the following properties to be set -# - hadoop.log.dir (Hadoop Log directory) -# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename) -# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender) - -log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger} -log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false -log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender -log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file} -log4j.appender.RMSUMMARY.MaxFileSize=256MB -log4j.appender.RMSUMMARY.MaxBackupIndex=20 -log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout -log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n -{noformat} - -This however, throws errors while running commands as non-superuser: -{noformat} --bash-4.1$ hadoop dfs -ls / -DEPRECATED: Use of this script to execute hdfs command is deprecated. -Instead use the hdfs command for it. - -log4j:ERROR setFile(null,true) call failed. -java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory) - at java.io.FileOutputStream.openAppend(Native Method) - at java.io.FileOutputStream.<init>(FileOutputStream.java:192) - at java.io.FileOutputStream.<init>(FileOutputStream.java:116) - at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) - at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207) - at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) - at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) - at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) - at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) - at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) - at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) - at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) - at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) - at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) - at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) - at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) - at org.apache.log4j.Logger.getLogger(Logger.java:104) - at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289) - at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109) - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) - at java.lang.reflect.Constructor.newInstance(Constructor.java:513) - at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116) - at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858) - at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604) - at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) - at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) - at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) - at org.apache.hadoop.fs.FsShell.<clinit>(FsShell.java:41) -Found 1 items -drwxr-xr-x - hadoop hadoop 0 2013-06-12 21:28 /user +RM app summary logs have been enabled as per the default config: + +{noformat} +# +# Yarn ResourceManager Application Summary Log +# +# Set the ResourceManager summary log filename +yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log +# Set the ResourceManager summary log level and appender +yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY + +# Appender for ResourceManager Application Summary Log +# Requires the following properties to be set +# - hadoop.log.dir (Hadoop Log directory) +# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename) +# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender) + +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger} +log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false +log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender +log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file} +log4j.appender.RMSUMMARY.MaxFileSize=256MB +log4j.appender.RMSUMMARY.MaxBackupIndex=20 +log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout +log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n +{noformat} + +This however, throws errors while running commands as non-superuser: +{noformat} +-bash-4.1$ hadoop dfs -ls / +DEPRECATED: Use of this script to execute hdfs command is deprecated. +Instead use the hdfs command for it. + +log4j:ERROR setFile(null,true) call failed. +java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory) + at java.io.FileOutputStream.openAppend(Native Method) + at java.io.FileOutputStream.<init>(FileOutputStream.java:192) + at java.io.FileOutputStream.<init>(FileOutputStream.java:116) + at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) + at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207) + at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) + at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) + at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) + at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) + at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) + at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) + at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) + at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) + at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) + at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) + at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) + at org.apache.log4j.Logger.getLogger(Logger.java:104) + at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289) + at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109) + at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) + at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) + at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) + at java.lang.reflect.Constructor.newInstance(Constructor.java:513) + at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116) + at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858) + at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604) + at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) + at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) + at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) + at org.apache.hadoop.fs.FsShell.<clinit>(FsShell.java:41) +Found 1 items +drwxr-xr-x - hadoop hadoop 0 2013-06-12 21:28 /user {noformat}
The implementation of - -bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java - -Tells the container-executor to write PIDs to cgroup.procs: - -{code} - public String getResourcesOption(ContainerId containerId) { - String containerName = containerId.toString(); - StringBuilder sb = new StringBuilder("cgroups="); - - if (isCpuWeightEnabled()) { - sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs"); - sb.append(","); - } - - if (sb.charAt(sb.length() - 1) == ',') { - sb.deleteCharAt(sb.length() - 1); - } - return sb.toString(); - } -{code} - -Apparently, this file has not always been writeable: - -https://patchwork.kernel.org/patch/116146/ -http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html -https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html - -The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. - -{quote} -$ uname -a -Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux -{quote} - -As a result, when the container-executor tries to run, it fails with this error message: - -bq. fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n", - -This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: - -{quote} -$ pwd -/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001 -$ ls -l -total 0 --r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs --rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us --rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us --rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares --rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release --rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks -{quote} - -I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. - -I can think of several potential resolutions to this ticket: - -1. Ignore the problem, and make people patch YARN when they hit this issue. -2. Write to /tasks instead of /cgroup.procs for everyone -3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. -4. Add a config to yarn-site that lets admins specify which file to write to. - +The implementation of + +bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java + +Tells the container-executor to write PIDs to cgroup.procs: + +{code} + public String getResourcesOption(ContainerId containerId) { + String containerName = containerId.toString(); + StringBuilder sb = new StringBuilder("cgroups="); + + if (isCpuWeightEnabled()) { + sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs"); + sb.append(","); + } + + if (sb.charAt(sb.length() - 1) == ',') { + sb.deleteCharAt(sb.length() - 1); + } + return sb.toString(); + } +{code} + +Apparently, this file has not always been writeable: + +https://patchwork.kernel.org/patch/116146/ +http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html +https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html + +The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. + +{quote} +$ uname -a +Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux +{quote} + +As a result, when the container-executor tries to run, it fails with this error message: + +bq. fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n", + +This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: + +{quote} +$ pwd +/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001 +$ ls -l +total 0 +-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs +-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us +-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us +-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares +-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release +-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks +{quote} + +I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. + +I can think of several potential resolutions to this ticket: + +1. Ignore the problem, and make people patch YARN when they hit this issue. +2. Write to /tasks instead of /cgroup.procs for everyone +3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. +4. Add a config to yarn-site that lets admins specify which file to write to. + Thoughts?
The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. +The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores.
Per discussion in YARN-689, reposting updated use case: - -1. I have a set of services co-existing with a Yarn cluster. - -2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing. - -3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa. -By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources. - -These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping. - -The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory). - -The current limitation is that the increment is also the minimum. - -If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc). - -If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster. - -Finally, on hard enforcement. - -* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. - +Per discussion in YARN-689, reposting updated use case: + +1. I have a set of services co-existing with a Yarn cluster. + +2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing. + +3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa. +By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources. + +These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping. + +The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory). + +The current limitation is that the increment is also the minimum. + +If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc). + +If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster. + +Finally, on hard enforcement. + +* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. + * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB.
Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing. +Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.
The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not. - -If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions. - -The lack of consistency will exacerbate the already difficult problem of resource configuration. +The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not. + +If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions. + +The lack of consistency will exacerbate the already difficult problem of resource configuration.
Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed +Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized
Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.) +Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.) After changing resource.java's toString method by replacing "<>" with "{}",this bug gets fixed.
See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/. - +See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/. + It passed on my machine though.
YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources. - +YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources. + This jira is a companion to allow for black-listing (in CS).
make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} & {{unregisterServiceListener()}} respectively. - +make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} & {{unregisterServiceListener()}} respectively. + This only affects a couple of production classes; {{Service.register()}} and is used in some of the lifecycle tests of the YARN-530. There are no tests of {{Service.unregister()}}, which is something that could be corrected.
In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/<user>/logs. Also mkdirs calls are made before this. +In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/<user>/logs. Also mkdirs calls are made before this.
Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions. - +Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions. + With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense. It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state.
The problem happens at: -{code} - // getContainerStatus can be called after stopContainer - try { - ContainerStatus status = nmClient.getContainerStatus( - container.getId(), container.getNodeId(), - container.getContainerToken()); - assertEquals(container.getId(), status.getContainerId()); - assertEquals(ContainerState.RUNNING, status.getState()); - assertTrue("" + i, status.getDiagnostics().contains( - "Container killed by the ApplicationMaster.")); - assertEquals(-1000, status.getExitStatus()); - } catch (YarnRemoteException e) { - fail("Exception is not expected"); - } -{code} - -NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. - -There will be the similar problem wrt NMClientImpl#startContainer. +The problem happens at: +{code} + // getContainerStatus can be called after stopContainer + try { + ContainerStatus status = nmClient.getContainerStatus( + container.getId(), container.getNodeId(), + container.getContainerToken()); + assertEquals(container.getId(), status.getContainerId()); + assertEquals(ContainerState.RUNNING, status.getState()); + assertTrue("" + i, status.getDiagnostics().contains( + "Container killed by the ApplicationMaster.")); + assertEquals(-1000, status.getExitStatus()); + } catch (YarnRemoteException e) { + fail("Exception is not expected"); + } +{code} + +NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. + +There will be the similar problem wrt NMClientImpl#startContainer.
The queue shows up as "Invalid Date" +The queue shows up as "Invalid Date" Finish Time shows up as a Long value.
This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA. - +This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA. + We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface
Tests are timing out. Looks like this is related to YARN-617. -{code} -2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container. -Expected containerId: user Found: container_1369183214008_0001_01_000001 -2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado -Expected containerId: user Found: container_1369183214008_0001_01_000001 -2013-05-21 17:40:23,695 INFO [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10. -Expected containerId: user Found: container_1369183214008_0001_01_000001 -org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container. -Expected containerId: user Found: container_1369183214008_0001_01_000001 - at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440) - at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72) - at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) - at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) +Tests are timing out. Looks like this is related to YARN-617. +{code} +2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container. +Expected containerId: user Found: container_1369183214008_0001_01_000001 +2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado +Expected containerId: user Found: container_1369183214008_0001_01_000001 +2013-05-21 17:40:23,695 INFO [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10. +Expected containerId: user Found: container_1369183214008_0001_01_000001 +org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container. +Expected containerId: user Found: container_1369183214008_0001_01_000001 + at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440) + at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72) + at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) + at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) {code}
NMToken will be sent to AM on allocate call if -1) AM doesn't already have NMToken for the underlying NM -2) Key rolled over on RM and AM gets new container on the same NM. +NMToken will be sent to AM on allocate call if +1) AM doesn't already have NMToken for the underlying NM +2) Key rolled over on RM and AM gets new container on the same NM. On allocate call RM will send a consolidated list of all required NMTokens.
BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records. - +BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records. + As a first step, we should just copy all the factory methods into individual classes, deprecate BuilderUtils and then slowly move all code off BuilderUtils.
This is required for additional changes in YARN-528. +This is required for additional changes in YARN-528. Some of the interfaces could use some cleanup as well - they shouldn't be declaring YarnException (Runtime) in their signature.
See the test failure in YARN-695 - +See the test failure in YARN-695 + https://builds.apache.org/job/PreCommit-YARN-Build/957//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPatternJar/
- Single code path for secure and non-secure cases is useful for testing, coverage. +- Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM.
Exception: -{noformat} -Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock -Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec <<< FAILURE! -testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock) Time elapsed: 873 sec <<< FAILURE! -java.lang.AssertionError: - at org.junit.Assert.fail(Assert.java:91) - at org.junit.Assert.assertTrue(Assert.java:43) - at org.junit.Assert.assertTrue(Assert.java:54) - at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) - at java.lang.reflect.Method.invoke(Method.java:597) - at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) - at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) - at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) - at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) - at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) +Exception: +{noformat} +Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock +Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec <<< FAILURE! +testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock) Time elapsed: 873 sec <<< FAILURE! +java.lang.AssertionError: + at org.junit.Assert.fail(Assert.java:91) + at org.junit.Assert.assertTrue(Assert.java:43) + at org.junit.Assert.assertTrue(Assert.java:54) + at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) + at java.lang.reflect.Method.invoke(Method.java:597) + at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) + at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) + at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) + at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) + at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat}
AM uses the NMToken to authenticate all the AM-NM communication. -NM will validate NMToken in below manner -* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId. -* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. -* If NMToken is invalid then NM will reject AM calls. - -Modification for ContainerToken -* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM). -* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). +AM uses the NMToken to authenticate all the AM-NM communication. +NM will validate NMToken in below manner +* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId. +* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. +* If NMToken is invalid then NM will reject AM calls. + +Modification for ContainerToken +* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM). +* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request.
This is part of YARN-613. -As per the updated design, AM will receive per NM, NMToken in following scenarios -* AM is receiving first container on underlying NM. -* AM is receiving container on underlying NM after either NM or RM rebooted. -** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). -** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. -* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. -In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. - +This is part of YARN-613. +As per the updated design, AM will receive per NM, NMToken in following scenarios +* AM is receiving first container on underlying NM. +* AM is receiving container on underlying NM after either NM or RM rebooted. +** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). +** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. +* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. +In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. + AMRMClient should expose these NMToken to client.
The DelegationTokenRenewer thread is critical to the RM. When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread. It should be exiting only on non-RuntimeExceptions. - +The DelegationTokenRenewer thread is critical to the RM. When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread. It should be exiting only on non-RuntimeExceptions. + The problem is especially bad in 23 because the yarn protobuf layer converts IOExceptions into UndeclaredThrowableExceptions (RuntimeException) which causes the renewer to abort the process. An UnknownHostException takes down the RM...
Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495. +Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495. On normal NM shutdown, this is not a problem since normal stop happens on shutdownHook thread.
YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. +YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions.
Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html: -1.In the section “Configuration”, It contains two properties named “yarn.scheduler.fair.minimum-allocation-mb”, the second one should be “yarn.scheduler.fair.maximum-allocation-mb” -2.In the section “Allocation file format”, the document tells “ The format contains three types of elements”, but it lists four types of elements following that. +Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html: +1.In the section “Configuration”, It contains two properties named “yarn.scheduler.fair.minimum-allocation-mb”, the second one should be “yarn.scheduler.fair.maximum-allocation-mb” +2.In the section “Allocation file format”, the document tells “ The format contains three types of elements”, but it lists four types of elements following that.
Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. - +Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. + In the minimum, this will avoid accidental bugs in AMs in unsecure mode.
Failed tests: testNode(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 - testNodeSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 - testNodeDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 - testNodeInfo(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 - testNodeInfoSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 - testNodeInfoDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 +Failed tests: testNode(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 + testNodeSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 + testNodeDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 + testNodeInfo(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 + testNodeInfoSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 + testNodeInfoDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testSingleNodesXML(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. - -In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. - +Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. + +In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. + Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow.
{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException: - -{code} -testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload) Time elapsed: 480 sec <<< ERROR! -org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory -gzip: 1: No such file or directory - - at org.apache.hadoop.util.Shell.runCommand(Shell.java:377) - at org.apache.hadoop.util.Shell.run(Shell.java:292) - at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497) - at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225) - at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503) +{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException: + +{code} +testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload) Time elapsed: 480 sec <<< ERROR! +org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory +gzip: 1: No such file or directory + + at org.apache.hadoop.util.Shell.runCommand(Shell.java:377) + at org.apache.hadoop.util.Shell.run(Shell.java:292) + at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497) + at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225) + at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503) {code}
Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer. - -Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM. - +Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer. + +Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM. +
There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to -applications resource requests, and node updates, and the more introspective, time-based considerations -needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate -mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor", -which can be run optionally as a separate service (much like the NMLivelinessMonitor). - -The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals -(e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, -performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to -improve capacity, and generates events that produce four possible actions: -# Container de-reservations -# Resource-based preemptions -# Container-based preemptions -# Container killing - -The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. -Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers -from a queue) and not trying to tightly and consistently micromanage container allocations. - - -------------- Preemption policy (ProportionalCapacityPreemptionPolicy): ------------- - -Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: -# it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) -# if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) -# it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and -respecting bounds on the amount of preemption we allow for each round) -# it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) -# it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits -# (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, -# (if not enough) it moves onto unreserve and preempt from the next application. -# containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. - -Notes: -(*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. -(**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. - -Tunables of the ProportionalCapacityPreemptionPolicy: -# observe-only mode (i.e., log the actions it would take, but behave as read-only) -# how frequently to run the policy -# how long to wait between preemption and kill of a container -# which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) -# deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) -# overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity) - -In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply. - -Generality: -The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, in fact, we are considering other monitors that leverage the same idea of "schedule edits" to target different global properties (e.g., allocate enough resources to guarantee deadlines for important jobs, or data-locality optimizations, IO-balancing among nodes, etc...). - -Note that by default the preemption policy we describe is disabled in the patch. - -Depends on YARN-45 and YARN-567, is related to YARN-568 +There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to +applications resource requests, and node updates, and the more introspective, time-based considerations +needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate +mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor", +which can be run optionally as a separate service (much like the NMLivelinessMonitor). + +The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals +(e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, +performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to +improve capacity, and generates events that produce four possible actions: +# Container de-reservations +# Resource-based preemptions +# Container-based preemptions +# Container killing + +The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. +Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers +from a queue) and not trying to tightly and consistently micromanage container allocations. + + +------------- Preemption policy (ProportionalCapacityPreemptionPolicy): ------------- + +Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: +# it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) +# if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) +# it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and +respecting bounds on the amount of preemption we allow for each round) +# it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) +# it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits +# (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, +# (if not enough) it moves onto unreserve and preempt from the next application. +# containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. + +Notes: +(*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. +(**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. + +Tunables of the ProportionalCapacityPreemptionPolicy: +# observe-only mode (i.e., log the actions it would take, but behave as read-only) +# how frequently to run the policy +# how long to wait between preemption and kill of a container +# which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) +# deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) +# overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity) + +In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply. + +Generality: +The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, in fact, we are considering other monitors that leverage the same idea of "schedule edits" to target different global properties (e.g., allocate enough resources to guarantee deadlines for important jobs, or data-locality optimizations, IO-balancing among nodes, etc...). + +Note that by default the preemption policy we describe is disabled in the patch. + +Depends on YARN-45 and YARN-567, is related to YARN-568
In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. +In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569.
A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. - -The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned -to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. - -By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). - -The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. +A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. + +The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned +to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. + +By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). + +The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.
This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type. +This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type.
Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. - -For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. - +Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. + +For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. + At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy.
Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse. - -{code} - GetNewApplicationResponse newApp = yarnClient.getNewApplication(); - ApplicationId appId = newApp.getApplicationId(); - - ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); - - appContext.setApplicationId(appId); -{code} - -A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like: - -{code} -GetNewApplicationResponse newApp = yarnClient.getNewApplication(); -ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext(); -{code} - +Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse. + +{code} + GetNewApplicationResponse newApp = yarnClient.getNewApplication(); + ApplicationId appId = newApp.getApplicationId(); + + ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); + + appContext.setApplicationId(appId); +{code} + +A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like: + +{code} +GetNewApplicationResponse newApp = yarnClient.getNewApplication(); +ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext(); +{code} + [The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.]
Public Localizer : -At present when multiple containers try to request a localized resource -* If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state) -* Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them. - -Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again. - -PrivateLocalizer : -Here a different but similar race condition is present. -* Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes. -* Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again. - +Public Localizer : +At present when multiple containers try to request a localized resource +* If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state) +* Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them. + +Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again. + +PrivateLocalizer : +Here a different but similar race condition is present. +* Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes. +* Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again. + At both the places the root cause of this is that all the threads try to acquire the lock on resource however current state of the LocalizedResource is not taken into consideration.
Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires. - +Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires. + I propose we change it to atleast two. Can change it to 4 to match other retry-configs.
I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. - -My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. - -Thanks, -Kishore - +I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. + +My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. + +Thanks, +Kishore +
If resource localization fails then resource remains in memory and is -1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). -2) reused if LocalizationRequest comes again for the same resource. - +If resource localization fails then resource remains in memory and is +1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). +2) reused if LocalizationRequest comes again for the same resource. + I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache.
When I run the job history server locally, every page load takes in the 10s of seconds. I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname. When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically. - -There's no that we need to perform this resolution on every page load. +When I run the job history server locally, every page load takes in the 10s of seconds. I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname. When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically. + +There's no that we need to perform this resolution on every page load.
# Extend the YARN {{Service}} interface as discussed in YARN-117 -# Implement the changes in {{AbstractService}} and {{FilterService}}. -# Migrate all services in yarn-common to the more robust service model, test. - +# Extend the YARN {{Service}} interface as discussed in YARN-117 +# Implement the changes in {{AbstractService}} and {{FilterService}}. +# Migrate all services in yarn-common to the more robust service model, test. +
Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. -It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler. +Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. +It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler.
On branch-2 the latest version I see the following on a secure cluster. - -{noformat} -2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now -2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of <me -mory:12288, vCores:16> -2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. -2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. -2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater -java.lang.NullPointerException - at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) - at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) -{noformat} - +On branch-2 the latest version I see the following on a secure cluster. + +{noformat} +2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now +2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of <me +mory:12288, vCores:16> +2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. +2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. +2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater +java.lang.NullPointerException + at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) + at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) +{noformat} + The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers.
The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call. That effectively stats the file twice. It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case. - +The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call. That effectively stats the file twice. It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case. + In addition we may consider caching the presence of the directory rather than checking it each time a node aggregates logs for an application.
When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't. They should all be refreshed. - -Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled - +When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't. They should all be refreshed. + +Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled + Ones that aren't: minimumAllocation, maximumAllocation, assignMultiple, maxAssign
TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace - -{noformat} -Stack Trace: -junit.framework.AssertionFailedError: expected:<false> but was:<true> - at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java) -{noformat} - +TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace + +{noformat} +Stack Trace: +junit.framework.AssertionFailedError: expected:<false> but was:<true> + at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java) +{noformat} + kill -9 is executed asynchronously, the signal is delivered when the process comes out of the kernel (sys call). Checking if the process died immediately after can fail at times.
Hey Guys, - -I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call). - -Thanks! +Hey Guys, + +I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call). + +Thanks! Chris
ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following: - -{noformat} -2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim. -2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim. -2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim. -{noformat} - -As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs. It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs. - +ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following: + +{noformat} +2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim. +2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim. +2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim. +{noformat} + +As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs. It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs. + We should either make this DEBUG or remove it entirely.
Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. - +Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. + The 2 applications not yet in running state do not get launched even though limits are increased.
Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. - +Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. + Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case).
coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter - +coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter + patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23
If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. - -java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed - at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) - at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) - at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) - at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) - at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) - at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) - at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) - at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) - at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) - at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) - at java.util.concurrent.FutureTask.run(FutureTask.java:138) - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) - at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) - at java.util.concurrent.FutureTask.run(FutureTask.java:138) - at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) - at java.lang.Thread.run(Thread.java:662) - +If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. + +java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed + at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) + at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) + at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) + at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) + at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) + at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) + at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) + at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) + at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) + at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) + at java.util.concurrent.FutureTask.run(FutureTask.java:138) + at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) + at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) + at java.util.concurrent.FutureTask.run(FutureTask.java:138) + at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) + at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) + at java.lang.Thread.run(Thread.java:662) + we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory.
We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. - +We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. +
The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address - -A new user trying to configure a cluster needs to know the names of all these four configs. - -The same issue exists for nodemanagers. - +The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address + +A new user trying to configure a cluster needs to know the names of all these four configs. + +The same issue exists for nodemanagers. + It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in.
Now the compare code is : -return a1.getApplicationId().getId() - a2.getApplicationId().getId(); - -Will be replaced with : -return a1.getApplicationId().compareTo(a2.getApplicationId()); - -This will bring some benefits: -1,leave applicationId compare logic to ApplicationId class; +Now the compare code is : +return a1.getApplicationId().getId() - a2.getApplicationId().getId(); + +Will be replaced with : +return a1.getApplicationId().compareTo(a2.getApplicationId()); + +This will bring some benefits: +1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition.
YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101. - -These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants. - +YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101. + +These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants. + Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class.
There's a bunch of unused methods like getAskCount() and getAsk(index) in AllocateRequest, and other interfaces. These should be removed. - -In YARN, found them in. MR will have it's own set. -AllocateRequest +There's a bunch of unused methods like getAskCount() and getAsk(index) in AllocateRequest, and other interfaces. These should be removed. + +In YARN, found them in. MR will have it's own set. +AllocateRequest StartContaienrResponse
In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: - -application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); - -Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) - -In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 - -application.getResourceRequest(priority, node.getHostName()); - +In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: + +application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); + +Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) + +In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 + +application.getResourceRequest(priority, node.getHostName()); + Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local.
We need to fix the following issues on YARN web-UI: - - Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout. - - When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#". - - The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches. +We need to fix the following issues on YARN web-UI: + - Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout. + - When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#". + - The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches. - The diagnostics for a failed app on the per-application page don't retain new lines and wrap'em around - looks hard to read.
{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator. This causes test failures on Windows. +{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator. This causes test failures on Windows.
We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. - +We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. + We should fix these before we go beta.
2013-02-06 09:31:33,813 INFO [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl -org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null - at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605) - at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150) - at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) - at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) +2013-02-06 09:31:33,813 INFO [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl +org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null + at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605) + at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150) + at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) + at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
In YARN-370, we changed it from setting the capability to directly setting memory and cores: - -- ask.setCapability(normalized); -+ ask.getCapability().setMemory(normalized.getMemory()); -+ ask.getCapability().setVirtualCores(normalized.getVirtualCores()); - -We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. - +In YARN-370, we changed it from setting the capability to directly setting memory and cores: + +- ask.setCapability(normalized); ++ ask.getCapability().setMemory(normalized.getMemory()); ++ ask.getCapability().setVirtualCores(normalized.getVirtualCores()); + +We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. + I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in.
The MR2 FS docs could use some improvements. - -Configuration: -- sizebasedweight - what is the "size" here? Total memory usage? - -Pool properties: -- minResources - what does min amount of aggregate memory mean given that this is not a reservation? -- maxResources - is this a hard limit? -- weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? -- schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? - -There's no mention of ACLs, even though they're supported. See the CS docs for comparison. - -Also there are a couple typos worth fixing while we're at it, eg "finish. apps to run" - +The MR2 FS docs could use some improvements. + +Configuration: +- sizebasedweight - what is the "size" here? Total memory usage? + +Pool properties: +- minResources - what does min amount of aggregate memory mean given that this is not a reservation? +- maxResources - is this a hard limit? +- weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? +- schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? + +There's no mention of ACLs, even though they're supported. See the CS docs for comparison. + +Also there are a couple typos worth fixing while we're at it, eg "finish. apps to run" + Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable.
I assume the Last-Last-Health-Update is a typo and it should just be Last-Health-Update. - - -$ yarn node -status foo.com:8041 -Node Report : - Node-Id : foo.com:8041 - Rack : /10.10.10.0 - Node-State : RUNNING - Node-Http-Address : foo.com:8042 - Health-Status(isNodeHealthy) : true - Last-Last-Health-Update : 1360118400219 - Health-Report : - Containers : 0 - Memory-Used : 0M +I assume the Last-Last-Health-Update is a typo and it should just be Last-Health-Update. + + +$ yarn node -status foo.com:8041 +Node Report : + Node-Id : foo.com:8041 + Rack : /10.10.10.0 + Node-State : RUNNING + Node-Http-Address : foo.com:8042 + Health-Status(isNodeHealthy) : true + Last-Last-Health-Update : 1360118400219 + Health-Report : + Containers : 0 + Memory-Used : 0M Memory-Capacity : 24576
HADOOP-9252 slightly changed the format of some StringUtils outputs. It caused TestContainersMonitor to fail. - +HADOOP-9252 slightly changed the format of some StringUtils outputs. It caused TestContainersMonitor to fail. + Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods.
Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. - -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) - at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) - at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) - at java.lang.Thread.run(Thread.java:680) - - +Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. + +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) + at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) + at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) + at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) + at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) + at java.lang.Thread.run(Thread.java:680) + + ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases.
Noticed the following in an error log output while doing some experiements - -./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle - -"defiend" should be "defined" +Noticed the following in an error log output while doing some experiements + +./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle + +"defiend" should be "defined"
Follow up from YARN-275 +Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt
Starting up the proxy server fails with this error: - -{noformat} -2013-01-29 17:37:41,357 FATAL webproxy.WebAppProxy (WebAppProxy.java:start(99)) - Could not start proxy web server -java.io.FileNotFoundException: webapps/proxy not found in CLASSPATH - at org.apache.hadoop.http.HttpServer.getWebAppsPath(HttpServer.java:533) - at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:225) - at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:164) - at org.apache.hadoop.yarn.server.webproxy.WebAppProxy.start(WebAppProxy.java:90) - at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) - at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.main(WebAppProxyServer.java:94) -{noformat} +Starting up the proxy server fails with this error: + +{noformat} +2013-01-29 17:37:41,357 FATAL webproxy.WebAppProxy (WebAppProxy.java:start(99)) - Could not start proxy web server +java.io.FileNotFoundException: webapps/proxy not found in CLASSPATH + at org.apache.hadoop.http.HttpServer.getWebAppsPath(HttpServer.java:533) + at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:225) + at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:164) + at org.apache.hadoop.yarn.server.webproxy.WebAppProxy.start(WebAppProxy.java:90) + at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) + at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.main(WebAppProxyServer.java:94) +{noformat}
When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results. - +When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results. + It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box.
{code:xml} -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) - at java.lang.Thread.run(Thread.java:662) -{code} -{code:xml} -2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) - at java.lang.Thread.run(Thread.java:662) -{code} -{code:xml} -2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) - at java.lang.Thread.run(Thread.java:662) -{code} -{code:xml} - -2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE -2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) - at java.lang.Thread.run(Thread.java:662) -2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null -{code} -{code:xml} - -2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) - at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) - at java.lang.Thread.run(Thread.java:662) -2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null -{code} +{code:xml} +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) + at java.lang.Thread.run(Thread.java:662) +{code} +{code:xml} +2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) + at java.lang.Thread.run(Thread.java:662) +{code} +{code:xml} +2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) + at java.lang.Thread.run(Thread.java:662) +{code} +{code:xml} + +2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE +2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) + at java.lang.Thread.run(Thread.java:662) +2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null +{code} +{code:xml} + +2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) + at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) + at java.lang.Thread.run(Thread.java:662) +2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null +{code}
Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to "default". - +Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to "default". + A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name.
With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. - +With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. + More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled.
RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever. +RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
{code:xml} -2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state -org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED - at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) - at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) - at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) - at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) - at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) - at java.lang.Thread.run(Thread.java:662) +{code:xml} +2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state +org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED + at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) + at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) + at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) + at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) + at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) + at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) + at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) + at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) + at java.lang.Thread.run(Thread.java:662) {code}
If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows. +If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows. This user preference should be stored in a cookie.
Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL". - +Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL". + Usually the diagnostic string on the RM app page has something useful, so we might as well point there.
Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. +Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated.
yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. - -Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. - +yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. + +Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. + The help message can also be more useful to users
If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. +If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good
If NM is started before starting the RM ,NM is shutting down with the following error -{code} -ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager -org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException - at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) - at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) - at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) - at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) -Caused by: java.lang.reflect.UndeclaredThrowableException - at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) - at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) - at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) - ... 3 more -Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused - at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) - at $Proxy23.registerNodeManager(Unknown Source) - at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) - ... 5 more -Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused - at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) - at org.apache.hadoop.ipc.Client.call(Client.java:1141) - at org.apache.hadoop.ipc.Client.call(Client.java:1100) - at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) - ... 7 more -Caused by: java.net.ConnectException: Connection refused - at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) - at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) - at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) - at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) - at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) - at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) - at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) - at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247) - at org.apache.hadoop.ipc.Client.call(Client.java:1117) - ... 9 more -2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted -java.lang.InterruptedException - at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) - at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) - at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) - at java.lang.Thread.run(Thread.java:619) -2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. -2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999 -2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped. -2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290 -2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290 -2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder -2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped. -2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted -java.lang.InterruptedException - at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) - at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) - at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) - at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) - at java.lang.Thread.run(Thread.java:619) +If NM is started before starting the RM ,NM is shutting down with the following error +{code} +ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager +org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException + at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) + at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) + at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) + at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) +Caused by: java.lang.reflect.UndeclaredThrowableException + at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) + at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) + at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) + ... 3 more +Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused + at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) + at $Proxy23.registerNodeManager(Unknown Source) + at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) + ... 5 more +Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused + at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) + at org.apache.hadoop.ipc.Client.call(Client.java:1141) + at org.apache.hadoop.ipc.Client.call(Client.java:1100) + at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) + ... 7 more +Caused by: java.net.ConnectException: Connection refused + at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) + at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) + at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) + at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) + at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) + at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) + at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) + at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247) + at org.apache.hadoop.ipc.Client.call(Client.java:1117) + ... 9 more +2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted +java.lang.InterruptedException + at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) + at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) + at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) + at java.lang.Thread.run(Thread.java:619) +2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. +2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999 +2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped. +2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290 +2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290 +2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder +2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped. +2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted +java.lang.InterruptedException + at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) + at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) + at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) + at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) + at java.lang.Thread.run(Thread.java:619) {code}
Ref: MAPREDUCE-4067 - -All YARN APIs currently throw YarnRemoteException. -1) This cannot be extended in it's current form. +Ref: MAPREDUCE-4067 + +All YARN APIs currently throw YarnRemoteException. +1) This cannot be extended in it's current form. 2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions.
Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy. - +Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy. + No tests other than manual review.
Split MAPREDUCE-3502 patches to make the RM code more resilient to being stopped more than once, or before started. - +Split MAPREDUCE-3502 patches to make the RM code more resilient to being stopped more than once, or before started. + This depends on MAPREDUCE-4014.
Having played the YARN service model, there are some issues -that I've identified based on past work and initial use. - -This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. - -h2. state model prevents stopped state being entered if you could not successfully start the service. - -In the current lifecycle you cannot stop a service unless it was successfully started, but -* {{init()}} may acquire resources that need to be explicitly released -* if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. - -*Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. - -Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped". - -MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. -h2. AbstractService doesn't prevent duplicate state change requests. - -The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} & {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. - -This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. - -h2. AbstractService state change doesn't defend against race conditions. - -There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. - -h2. Static methods to choreograph of lifecycle operations - -Helper methods to move things through lifecycles. init->start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. - -h2. state transition failures are something that registered service listeners may wish to be informed of. - -When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. - -*Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. - -h2. Service listener failures not handled - -Is this an error an error or not? Log and ignore may not be what is desired. - -*Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes. - -h2. Support static listeners for all AbstractServices - -Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events. - -The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point). - -These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked. - -h2. Add some example listeners for management/diagnostics -* event to commons log for humans. -* events for machines hooked up to the JSON logger. -* for testing: something that be told to fail. - -h2. Services should support signal interruptibility - +Having played the YARN service model, there are some issues +that I've identified based on past work and initial use. + +This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. + +h2. state model prevents stopped state being entered if you could not successfully start the service. + +In the current lifecycle you cannot stop a service unless it was successfully started, but +* {{init()}} may acquire resources that need to be explicitly released +* if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. + +*Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. + +Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped". + +MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. +h2. AbstractService doesn't prevent duplicate state change requests. + +The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} & {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. + +This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. + +h2. AbstractService state change doesn't defend against race conditions. + +There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. + +h2. Static methods to choreograph of lifecycle operations + +Helper methods to move things through lifecycles. init->start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. + +h2. state transition failures are something that registered service listeners may wish to be informed of. + +When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. + +*Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. + +h2. Service listener failures not handled + +Is this an error an error or not? Log and ignore may not be what is desired. + +*Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes. + +h2. Support static listeners for all AbstractServices + +Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events. + +The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point). + +These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked. + +h2. Add some example listeners for management/diagnostics +* event to commons log for humans. +* events for machines hooked up to the JSON logger. +* for testing: something that be told to fail. + +h2. Services should support signal interruptibility + The services would benefit from a way of shutting them down on a kill signal; this can be done via a runtime hook. It should not be automatic though, as composite services will get into a very complex state during shutdown. Better to provide a hook that lets you register/unregister services to terminate, and have the relevant {{main()}} entry points tell their root services to register themselves.
see the red color: - -org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java - - protected void startStatusUpdater() { - - new Thread("Node Status Updater") { - @Override - @SuppressWarnings("unchecked") - public void run() { - int lastHeartBeatID = 0; - while (!isStopped) { - // Send heartbeat - try { - synchronized (heartbeatMonitor) { - heartbeatMonitor.wait(heartBeatInterval); - } - {color:red} - // Before we send the heartbeat, we get the NodeStatus, - // whose method removes completed containers. - NodeStatus nodeStatus = getNodeStatus(); - {color} - nodeStatus.setResponseId(lastHeartBeatID); - - NodeHeartbeatRequest request = recordFactory - .newRecordInstance(NodeHeartbeatRequest.class); - request.setNodeStatus(nodeStatus); - {color:red} - - // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. - HeartbeatResponse response = - resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); - {color} - - if (response.getNodeAction() == NodeAction.SHUTDOWN) { - LOG - .info("Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat," + - " hence shutting down."); - NodeStatusUpdaterImpl.this.stop(); - break; - } - if (response.getNodeAction() == NodeAction.REBOOT) { - LOG.info("Node is out of sync with ResourceManager," - + " hence rebooting."); - NodeStatusUpdaterImpl.this.reboot(); - break; - } - - lastHeartBeatID = response.getResponseId(); - List<ContainerId> containersToCleanup = response - .getContainersToCleanupList(); - if (containersToCleanup.size() != 0) { - dispatcher.getEventHandler().handle( - new CMgrCompletedContainersEvent(containersToCleanup)); - } - List<ApplicationId> appsToCleanup = - response.getApplicationsToCleanupList(); - //Only start tracking for keepAlive on FINISH_APP - trackAppsForKeepAlive(appsToCleanup); - if (appsToCleanup.size() != 0) { - dispatcher.getEventHandler().handle( - new CMgrCompletedAppsEvent(appsToCleanup)); - } - } catch (Throwable e) { - // TODO Better error handling. Thread can die with the rest of the - // NM still running. - LOG.error("Caught exception in status-updater", e); - } - } - } - }.start(); - } - - - - private NodeStatus getNodeStatus() { - - NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); - nodeStatus.setNodeId(this.nodeId); - - int numActiveContainers = 0; - List<ContainerStatus> containersStatuses = new ArrayList<ContainerStatus>(); - for (Iterator<Entry<ContainerId, Container>> i = - this.context.getContainers().entrySet().iterator(); i.hasNext();) { - Entry<ContainerId, Container> e = i.next(); - ContainerId containerId = e.getKey(); - Container container = e.getValue(); - - // Clone the container to send it to the RM - org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = - container.cloneAndGetContainerStatus(); - containersStatuses.add(containerStatus); - ++numActiveContainers; - LOG.info("Sending out status for container: " + containerStatus); - {color:red} - - // Here is the part that removes the completed containers. - if (containerStatus.getState() == ContainerState.COMPLETE) { - // Remove - i.remove(); - {color} - - LOG.info("Removed completed container " + containerId); - } - } - nodeStatus.setContainersStatuses(containersStatuses); - - LOG.debug(this.nodeId + " sending out status for " - + numActiveContainers + " containers"); - - NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus(); - nodeHealthStatus.setHealthReport(healthChecker.getHealthReport()); - nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy()); - nodeHealthStatus.setLastHealthReportTime( - healthChecker.getLastHealthReportTime()); - if (LOG.isDebugEnabled()) { - LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy() - + ", " + nodeHealthStatus.getHealthReport()); - } - nodeStatus.setNodeHealthStatus(nodeHealthStatus); - - List<ApplicationId> keepAliveAppIds = createKeepAliveApplicationList(); - nodeStatus.setKeepAliveApplications(keepAliveAppIds); - - return nodeStatus; - } +see the red color: + +org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java + + protected void startStatusUpdater() { + + new Thread("Node Status Updater") { + @Override + @SuppressWarnings("unchecked") + public void run() { + int lastHeartBeatID = 0; + while (!isStopped) { + // Send heartbeat + try { + synchronized (heartbeatMonitor) { + heartbeatMonitor.wait(heartBeatInterval); + } + {color:red} + // Before we send the heartbeat, we get the NodeStatus, + // whose method removes completed containers. + NodeStatus nodeStatus = getNodeStatus(); + {color} + nodeStatus.setResponseId(lastHeartBeatID); + + NodeHeartbeatRequest request = recordFactory + .newRecordInstance(NodeHeartbeatRequest.class); + request.setNodeStatus(nodeStatus); + {color:red} + + // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. + HeartbeatResponse response = + resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); + {color} + + if (response.getNodeAction() == NodeAction.SHUTDOWN) { + LOG + .info("Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat," + + " hence shutting down."); + NodeStatusUpdaterImpl.this.stop(); + break; + } + if (response.getNodeAction() == NodeAction.REBOOT) { + LOG.info("Node is out of sync with ResourceManager," + + " hence rebooting."); + NodeStatusUpdaterImpl.this.reboot(); + break; + } + + lastHeartBeatID = response.getResponseId(); + List<ContainerId> containersToCleanup = response + .getContainersToCleanupList(); + if (containersToCleanup.size() != 0) { + dispatcher.getEventHandler().handle( + new CMgrCompletedContainersEvent(containersToCleanup)); + } + List<ApplicationId> appsToCleanup = + response.getApplicationsToCleanupList(); + //Only start tracking for keepAlive on FINISH_APP + trackAppsForKeepAlive(appsToCleanup); + if (appsToCleanup.size() != 0) { + dispatcher.getEventHandler().handle( + new CMgrCompletedAppsEvent(appsToCleanup)); + } + } catch (Throwable e) { + // TODO Better error handling. Thread can die with the rest of the + // NM still running. + LOG.error("Caught exception in status-updater", e); + } + } + } + }.start(); + } + + + + private NodeStatus getNodeStatus() { + + NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); + nodeStatus.setNodeId(this.nodeId); + + int numActiveContainers = 0; + List<ContainerStatus> containersStatuses = new ArrayList<ContainerStatus>(); + for (Iterator<Entry<ContainerId, Container>> i = + this.context.getContainers().entrySet().iterator(); i.hasNext();) { + Entry<ContainerId, Container> e = i.next(); + ContainerId containerId = e.getKey(); + Container container = e.getValue(); + + // Clone the container to send it to the RM + org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = + container.cloneAndGetContainerStatus(); + containersStatuses.add(containerStatus); + ++numActiveContainers; + LOG.info("Sending out status for container: " + containerStatus); + {color:red} + + // Here is the part that removes the completed containers. + if (containerStatus.getState() == ContainerState.COMPLETE) { + // Remove + i.remove(); + {color} + + LOG.info("Removed completed container " + containerId); + } + } + nodeStatus.setContainersStatuses(containersStatuses); + + LOG.debug(this.nodeId + " sending out status for " + + numActiveContainers + " containers"); + + NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus(); + nodeHealthStatus.setHealthReport(healthChecker.getHealthReport()); + nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy()); + nodeHealthStatus.setLastHealthReportTime( + healthChecker.getLastHealthReportTime()); + if (LOG.isDebugEnabled()) { + LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy() + + ", " + nodeHealthStatus.getHealthReport()); + } + nodeStatus.setNodeHealthStatus(nodeHealthStatus); + + List<ApplicationId> keepAliveAppIds = createKeepAliveApplicationList(); + nodeStatus.setKeepAliveApplications(keepAliveAppIds); + + return nodeStatus; + }
If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. - - -{code:xml} -java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed - at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) - at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) - at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) - at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) - at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) - at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) - at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) - at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) - at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) - at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) - at java.util.concurrent.FutureTask.run(FutureTask.java:138) - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) - at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) - at java.util.concurrent.FutureTask.run(FutureTask.java:138) - at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) - at java.lang.Thread.run(Thread.java:662) -{code} - +If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. + + +{code:xml} +java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed + at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) + at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) + at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) + at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) + at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) + at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) + at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) + at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) + at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) + at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) + at java.util.concurrent.FutureTask.run(FutureTask.java:138) + at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) + at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) + at java.util.concurrent.FutureTask.run(FutureTask.java:138) + at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) + at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) + at java.lang.Thread.run(Thread.java:662) +{code} + We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.
We have to make sure that NodeManagers cleanup their local files on restart. - +We have to make sure that NodeManagers cleanup their local files on restart. + It may already be working like that in which case we should have tests validating this.
Clone of YARN-51. - +Clone of YARN-51. + ApplicationMaster should not be able to store container tokens and use the same set of tokens for repetitive container launches. The possibility of such abuse is there in the current code, for a duration of 1d+10mins, we need to fix this.
The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. - +The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. + [1] http://research.yahoo.com/files/yl-2012-003.pdf
With this improvement the following options are available in release 1.2.0 and later on 1.x release stream: -1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro. -2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out. -3. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err. - -With this improvement the following options are available in release 2.0.4 and later on 2.x release stream: -1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out. -2. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err. - -For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303: -To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment. +With this improvement the following options are available in release 1.2.0 and later on 1.x release stream: +1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro. +2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out. +3. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err. + +With this improvement the following options are available in release 2.0.4 and later on 2.x release stream: +1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out. +2. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err. + +For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303: +To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.
Raw SASL protocol now uses protobufs wrapped with RPC headers. -The negotiation sequence incorporates the state of the exchange. +Raw SASL protocol now uses protobufs wrapped with RPC headers. +The negotiation sequence incorporates the state of the exchange. The server now has the ability to advertise its supported auth types.