Skip to content

[Release-7.1] Fix checkall Command Blocking Issue#11146

Merged
jzhou77 merged 2 commits into
apple:release-7.1from
kakaiu:fix-check-all-block-issue
Jan 26, 2024
Merged

[Release-7.1] Fix checkall Command Blocking Issue#11146
jzhou77 merged 2 commits into
apple:release-7.1from
kakaiu:fix-check-all-block-issue

Conversation

@kakaiu
Copy link
Copy Markdown
Member

@kakaiu kakaiu commented Jan 24, 2024

Consider two storage servers:
SS1: 1,2,3,4
SS2: 1
The output of checkall should be 2,3,4 as the unique key of SS1.
However, the checkall gets stuck in this case.
WLOG, suppose the checkall read batch is 2 at a time.
In the first step, the beginKey is 1; 1 and 2 are read from SS1 and 1 is read from SS2; The checker sets 1 as the nextBeginKey.
In the second step, the beginKey is still 1 => the checker gets stuck --- repeatedly access to the beginKey 1.
To avoid this issue, if the checkall algorithms finds that the nextBeginKey remains the same as the beginKey of the current round, the checkall splits the remaining range into two subRanges, where the maximum endKey of the current round is the pivot. In this example:
In the second step, the checkall finds beginKey == nextBeginKey. Then, the checkall spawns a child checkall to check for the inputRange of 1~2.
In the third step, the beginKey is 2; SS1 replies [empty], and SS2 replies 2, 3; The nextBeginKey is 3...

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 9aec703
  • Duration 0:07:46
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 9aec703
  • Duration 0:08:14
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 9aec703
  • Duration 0:08:48
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 9aec703
  • Duration 0:15:00
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 9aec703
  • Duration 0:46:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Comment thread fdbcli/DebugCommands.actor.cpp
Comment thread fdbcli/DebugCommands.actor.cpp
Comment thread fdbcli/DebugCommands.actor.cpp
@flowguru
Copy link
Copy Markdown
Contributor

I am having a hard time understanding how the spawn new range works.. will discuss offline

Comment thread fdbcli/DebugCommands.actor.cpp Outdated
}

printf("Checking complete.\n");
// The command is used to check the inconsistency in a keyspace, default is \xff\x02/blog/ keyspace.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove ", default is \xff\x02/blog/ keyspace" now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

printable(beginKeyToCheck).c_str(),
printable(claimEndKey).c_str(),
hasMore);
if (claimEndKey.empty()) {
Copy link
Copy Markdown
Contributor

@jzhou77 jzhou77 Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key range is indeed empty, this will be empty. In this case, this function should simply return true.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot return, but we can skip the shard.

@kakaiu
Copy link
Copy Markdown
Member Author

kakaiu commented Jan 25, 2024

I am having a hard time understanding how the spawn new range works.. will discuss offline

We have discussed all concerns offline and will add more comments to the code.

@kakaiu kakaiu requested review from flowguru and jzhou77 January 25, 2024 22:02
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:07:41
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:08:35
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 2a2adf3
  • Duration 0:09:07
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:47:04
  • Result: ❌ FAILED
  • Error: Error while executing command: if [ ! -f build_output/junit.xml ]; then touch build_output/junit.xml; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@kakaiu kakaiu closed this Jan 25, 2024
@kakaiu kakaiu reopened this Jan 25, 2024
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:07:57
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:08:10
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 2a2adf3
  • Duration 0:15:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:44:01
  • Result: ❌ FAILED
  • Error: Error while executing command: if [ ! -f build_output/junit.xml ]; then touch build_output/junit.xml; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@kakaiu kakaiu closed this Jan 26, 2024
@kakaiu kakaiu reopened this Jan 26, 2024
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:07:34
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:08:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 2a2adf3
  • Duration 0:16:13
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 2a2adf3
  • Duration 0:19:36
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 2a2adf3
  • Duration 0:46:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jzhou77 jzhou77 merged commit f19a57d into apple:release-7.1 Jan 26, 2024
@kakaiu kakaiu changed the title Fix checkall Command Blocking Issue [Release-7.1] Fix checkall Command Blocking Issue Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants