HADOOP-18125. Utility to identify git commit / Jira fixVersion discrepancies for RC preparation (#3991)

Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
This commit is contained in:
Viraj Jasani 2022-02-22 08:30:38 +05:30 committed by GitHub
parent 589695c6a9
commit 697e5d4636
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 270 additions and 0 deletions

View File

@ -0,0 +1,134 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Apache Hadoop Git/Jira FixVersion validation
============================================================
Git commits in Apache Hadoop contains Jira number of the format
HADOOP-XXXX or HDFS-XXXX or YARN-XXXX or MAPREDUCE-XXXX.
While creating a release candidate, we also include changelist
and this changelist can be identified based on Fixed/Closed Jiras
with the correct fix versions. However, sometimes we face few
inconsistencies between fixed Jira and Git commit message.
git_jira_fix_version_check.py script takes care of
identifying all git commits with commit
messages with any of these issues:
1. commit is reverted as per commit message
2. commit does not contain Jira number format in message
3. Jira does not have expected fixVersion
4. Jira has expected fixVersion, but it is not yet resolved
Moreover, this script also finds any resolved Jira with expected
fixVersion but without any corresponding commit present.
This should be useful as part of RC preparation.
git_jira_fix_version_check supports python3 and it required
installation of jira:
```
$ python3 --version
Python 3.9.7
$ python3 -m venv ./venv
$ ./venv/bin/pip install -r dev-support/git-jira-validation/requirements.txt
$ ./venv/bin/python dev-support/git-jira-validation/git_jira_fix_version_check.py
```
The script also requires below inputs:
```
1. First commit hash to start excluding commits from history:
Usually we can provide latest commit hash from last tagged release
so that the script will only loop through all commits in git commit
history before this commit hash. e.g for 3.3.2 release, we can provide
git hash: fa4915fdbbbec434ab41786cb17b82938a613f16
because this commit bumps up hadoop pom versions to 3.3.2:
https://github.com/apache/hadoop/commit/fa4915fdbbbec434ab41786cb17b82938a613f16
2. Fix Version:
Exact fixVersion that we would like to compare all Jira's fixVersions
with. e.g for 3.3.2 release, it should be 3.3.2.
3. JIRA Project Name:
The exact name of Project as case-sensitive e.g HADOOP / OZONE
4. Path of project's working dir with release branch checked-in:
Path of project from where we want to compare git hashes from. Local fork
of the project should be up-to date with upstream and expected release
branch should be checked-in.
5. Jira server url (default url: https://issues.apache.org/jira):
Default value of server points to ASF Jiras but this script can be
used outside of ASF Jira too.
```
Example of script execution:
```
JIRA Project Name (e.g HADOOP / OZONE etc): HADOOP
First commit hash to start excluding commits from history: fa4915fdbbbec434ab41786cb17b82938a613f16
Fix Version: 3.3.2
Jira server url (default: https://issues.apache.org/jira):
Path of project's working dir with release branch checked-in: /Users/vjasani/Documents/src/hadoop-3.3/hadoop
Check git status output and verify expected branch
On branch branch-3.3.2
Your branch is up to date with 'origin/branch-3.3.2'.
nothing to commit, working tree clean
Jira/Git commit message diff starting: ##############################################
Jira not present with version: 3.3.2. Commit: 8cd8e435fb43a251467ca74fadcb14f21a3e8163 HADOOP-17198. Support S3 Access Points (#3260) (branch-3.3.2) (#3955)
WARN: Jira not found. Commit: 8af28b7cca5c6020de94e739e5373afc69f399e5 Updated the index as per 3.3.2 release
WARN: Jira not found. Commit: e42e483d0085aa46543ebcb1196dd155ddb447d0 Make upstream aware of 3.3.1 release
Commit seems reverted. Commit: 6db1165380cd308fb74c9d17a35c1e57174d1e09 Revert "HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)"
Commit seems reverted. Commit: 1e3f94fa3c3d4a951d4f7438bc13e6f008f228f4 Revert "HDFS-16333. fix balancer bug when transfer an EC block (#3679)"
Jira not present with version: 3.3.2. Commit: ce0bc7b473a62a580c1227a4de6b10b64b045d3a HDFS-16344. Improve DirectoryScanner.Stats#toString (#3695)
Jira not present with version: 3.3.2. Commit: 30f0629d6e6f735c9f4808022f1a1827c5531f75 HDFS-16339. Show the threshold when mover threads quota is exceeded (#3689)
Jira not present with version: 3.3.2. Commit: e449daccf486219e3050254d667b74f92e8fc476 YARN-11007. Correct words in YARN documents (#3680)
Commit seems reverted. Commit: 5c189797828e60a3329fd920ecfb99bcbccfd82d Revert "HDFS-16336. Addendum: De-flake TestRollingUpgrade#testRollback (#3686)"
Jira not present with version: 3.3.2. Commit: 544dffd179ed756bc163e4899e899a05b93d9234 HDFS-16171. De-flake testDecommissionStatus (#3280)
Jira not present with version: 3.3.2. Commit: c6914b1cb6e4cab8263cd3ae5cc00bc7a8de25de HDFS-16350. Datanode start time should be set after RPC server starts successfully (#3711)
Jira not present with version: 3.3.2. Commit: 328d3b84dfda9399021ccd1e3b7afd707e98912d HDFS-16336. Addendum: De-flake TestRollingUpgrade#testRollback (#3686)
Jira not present with version: 3.3.2. Commit: 3ae8d4ccb911c9ababd871824a2fafbb0272c016 HDFS-16336. De-flake TestRollingUpgrade#testRollback (#3686)
Jira not present with version: 3.3.2. Commit: 15d3448e25c797b7d0d401afdec54683055d4bb5 HADOOP-17975. Fallback to simple auth does not work for a secondary DistributedFileSystem instance. (#3579)
Jira not present with version: 3.3.2. Commit: dd50261219de71eaa0a1ad28529953e12dfb92e0 YARN-10991. Fix to ignore the grouping "[]" for resourcesStr in parseResourcesString method (#3592)
Jira not present with version: 3.3.2. Commit: ef462b21bf03b10361d2f9ea7b47d0f7360e517f HDFS-16332. Handle invalid token exception in sasl handshake (#3677)
WARN: Jira not found. Commit: b55edde7071419410ea5bea4ce6462b980e48f5b Also update hadoop.version to 3.3.2
...
...
...
Found first commit hash after which git history is redundant. commit: fa4915fdbbbec434ab41786cb17b82938a613f16
Exiting successfully
Jira/Git commit message diff completed: ##############################################
Any resolved Jira with fixVersion 3.3.2 but corresponding commit not present
Starting diff: ##############################################
HADOOP-18066 is marked resolved with fixVersion 3.3.2 but no corresponding commit found
HADOOP-17936 is marked resolved with fixVersion 3.3.2 but no corresponding commit found
Completed diff: ##############################################
```

View File

@ -0,0 +1,118 @@
#!/usr/bin/env python3
############################################################################
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
############################################################################
"""An application to assist Release Managers with ensuring that histories in
Git and fixVersions in JIRA are in agreement. See README.md for a detailed
explanation.
"""
import os
import re
import subprocess
from jira import JIRA
jira_project_name = input("JIRA Project Name (e.g HADOOP / OZONE etc): ") \
or "HADOOP"
# Define project_jira_keys with - appended. e.g for HADOOP Jiras,
# project_jira_keys should include HADOOP-, HDFS-, YARN-, MAPREDUCE-
project_jira_keys = [jira_project_name + '-']
if jira_project_name == 'HADOOP':
project_jira_keys.append('HDFS-')
project_jira_keys.append('YARN-')
project_jira_keys.append('MAPREDUCE-')
first_exclude_commit_hash = input("First commit hash to start excluding commits from history: ")
fix_version = input("Fix Version: ")
jira_server_url = input(
"Jira server url (default: https://issues.apache.org/jira): ") \
or "https://issues.apache.org/jira"
jira = JIRA(server=jira_server_url)
local_project_dir = input("Path of project's working dir with release branch checked-in: ")
os.chdir(local_project_dir)
GIT_STATUS_MSG = subprocess.check_output(['git', 'status']).decode("utf-8")
print('\nCheck git status output and verify expected branch\n')
print(GIT_STATUS_MSG)
print('\nJira/Git commit message diff starting: ##############################################')
issue_set_from_commit_msg = set()
for commit in subprocess.check_output(['git', 'log', '--pretty=oneline']).decode(
"utf-8").splitlines():
if commit.startswith(first_exclude_commit_hash):
print("Found first commit hash after which git history is redundant. commit: "
+ first_exclude_commit_hash)
print("Exiting successfully")
break
if re.search('revert', commit, re.IGNORECASE):
print("Commit seems reverted. \t\t\t Commit: " + commit)
continue
ACTUAL_PROJECT_JIRA = None
for project_jira in project_jira_keys:
if project_jira in commit:
ACTUAL_PROJECT_JIRA = project_jira
break
if not ACTUAL_PROJECT_JIRA:
print("WARN: Jira not found. \t\t\t Commit: " + commit)
continue
JIRA_NUM = ''
for c in commit.split(ACTUAL_PROJECT_JIRA)[1]:
if c.isdigit():
JIRA_NUM = JIRA_NUM + c
else:
break
issue = jira.issue(ACTUAL_PROJECT_JIRA + JIRA_NUM)
EXPECTED_FIX_VERSION = False
for version in issue.fields.fixVersions:
if version.name == fix_version:
EXPECTED_FIX_VERSION = True
break
if not EXPECTED_FIX_VERSION:
print("Jira not present with version: " + fix_version + ". \t Commit: " + commit)
continue
if issue.fields.status is None or issue.fields.status.name not in ('Resolved', 'Closed'):
print("Jira is not resolved yet? \t\t Commit: " + commit)
else:
# This means Jira corresponding to current commit message is resolved with expected
# fixVersion.
# This is no-op by default, if needed, convert to print statement.
issue_set_from_commit_msg.add(ACTUAL_PROJECT_JIRA + JIRA_NUM)
print('Jira/Git commit message diff completed: ##############################################')
print('\nAny resolved Jira with fixVersion ' + fix_version
+ ' but corresponding commit not present')
print('Starting diff: ##############################################')
all_issues_with_fix_version = jira.search_issues(
'project=' + jira_project_name + ' and status in (Resolved,Closed) and fixVersion='
+ fix_version)
for issue in all_issues_with_fix_version:
if issue.key not in issue_set_from_commit_msg:
print(issue.key + ' is marked resolved with fixVersion ' + fix_version
+ ' but no corresponding commit found')
print('Completed diff: ##############################################')

View File

@ -0,0 +1,18 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
jira==3.1.1