Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 2775: kubectl delete interactivity and delay #2777

Closed
wants to merge 1 commit into from

Conversation

eddiezane
Copy link
Member

ref: #2775

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eddiezane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 2, 2021
@eddiezane eddiezane force-pushed the ez/2775-init branch 5 times, most recently from e5d7318 to 0f32113 Compare June 7, 2021 23:34
@eddiezane
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2021
@eddiezane eddiezane marked this pull request as ready for review June 7, 2021 23:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2021
@eddiezane
Copy link
Member Author

eddiezane commented Jun 7, 2021

/cc @liggitt @deads2k @soltysh @thockin

Copy link

@briantopping briantopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One request, the KEP is currently oriented toward kubectl when I believe it ought to be a system-wide initiative to stop existentially destructive behaviors. kubeadm reset is in this realm, but not all resets are existential. When such a situation is detected, the same behavior ought to be used. Are there other systems of this nature that should also be covered?


## Proposal

1. `kubectl delete [--all | --all-namespaces]` will warn about the destructive action that will be performed and artificially delay for x seconds allowing users a chance to abort.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the same behavior with kubeadm reset when the node being acted upon would remove etcd quorum.

Copy link
Member

@neolit123 neolit123 Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeadm reset shows a confirmation prompt, unless the user passes -f.

$ kubeadm reset 
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: 

its phases are documented, can be executed on demand or skipped:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the resolution to this is after the user presses 'y' and the cluster is determined to go unstable after such a request, a second warning is displayed alerting to such.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to entertain the idea of API-side delete-inhibition. Do you think that should be here or a different KEP?


```
kubectl delete namespace prod -i
Are you sure? (y/n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest default to N. e.g. Are you sure? (y/N)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Changed and explicitly called out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is y/N is the standard across Kubernetes ? In some places, we do use Yes/No.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check for precedent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried locally, how initial development looks like (pardon for the verbose/warning :) )
kubectl_delete_interactive1

@thockin @liggitt @eddiezane - If the interactive flag is not given, it suppose to go for delay. we can delay it using ticker but every second is expect to log as a line like ( Ignore the Order ascending or descending )

->resources will be deleted in 9 sec
->resources will be deleted in 8 sec... ??

Not sure if it is possible to do like state changes/clock in node/UI - resources will be delete in (9..8..7.....) ?

Signed-off-by: Eddie Zaneski <eddiezane@gmail.com>
@eddiezane
Copy link
Member Author

I'd like to entertain the idea of API-side delete-inhibition. Do you think that should be here or a different KEP?

I think that should be in a different KEP. I liked what @brendandburns shared about the locking Azure does.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

https://groups.google.com/g/kubernetes-dev/c/y4Q20V3dyOk/m/LOe7Id1DBgAJ
https://groups.google.com/g/kubernetes-dev/c/y4Q20V3dyOk/m/W0y2fD-NAAAJ

During RFC discussion several ideas were shared. One of them was a locking mechanism for resources that would prevent deletion. This is considered orthogonal to this KEP because users don't intend to delete the resources they are accidentally deleting - not that there are things that should never be deleted.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the statement - intent not to delete resource but accidentally being deleted. However, there are things that cannot be deleted are at different levels in Kubernetes like Kube-* namespaces, default namespaces if you try to delete it using the same kubectl command, it is 403 forbidden.

Do you think is it worth re-phrasing not that there are things that should never be deleted ?


1. `kubectl delete [--all | --all-namespaces]` will warn about the destructive action that will be performed and artificially delay for x seconds allowing users a chance to abort.

2. Add a new `--interactive | -i` flag to `kubectl delete` that will require confirmation before deleting resources imperatively (i.e. not `-f`). This flag will be false by default. If this flag is explicitly set to false it will not prompt or artificially delay.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this interact with other consumers of stdin in kubectl (e.g. client-go credential plugins that make use of stdin, or manifest reading from stdin like -f -)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will matter where in the call stack we read stdin. The client-go credential plugins probably happen early the flow? I see this living near the end right inside the delete command.

Will need to play around with it with. We can have some integration tests for these.

What would you like to see about it in the KEP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client-go credential plugins probably happen early the flow?

I think it depends on where the client config is constructed.

What would you like to see about it in the KEP?

Describing specific places we use stdin in conjunction with kubectl delete calls today (-f -, exec plugins, etc), and that this --interactive option cannot be used in combination with those (and exercising the behavior when both are attempted with a testcase)


## Proposal

1. `kubectl delete [--all | --all-namespaces]` will warn about the destructive action that will be performed and artificially delay for x seconds allowing users a chance to abort.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this being proposed for delete invocations for:

  • all --all invocations
  • cluster-scoped resources
  • namespacesd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this being proposed for delete invocations for:

  • all --all invocations

Yes.

  • cluster-scoped resources
  • namespacesd

Not specifically. Only when deleted with --all or --all-namespaces.


2. Add a new `--interactive | -i` flag to `kubectl delete` that will require confirmation before deleting resources imperatively (i.e. not `-f`). This flag will be false by default. If this flag is explicitly set to false it will not prompt or artificially delay.

These two changes introduce significant protections for users and do not break backwards compatibility. Scripts will see an x second delay unless they explicitly set `-i=false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scripts will see an x second delay unless they explicitly set -i=false.

I expected us to attempt to detect interactive invocations and only delay in those cases. That would limit the delay for scripts to environments which mimic interactive terminals, which seems better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were @thockin's thoughts.

I think I'd rather they set -i false explicitly
that is obvious AT THE CALL SITE as opposed to looking for ${CI} being set somewhere else
it's a big declaration of "I meant to do that"
When it comes to scary stuff I'll take "annoyingly explicit" every time
e.g. there's not a ${UNSAFE_RM} variable that implies -f

I did digging on detecting a TTY and it's a pretty mixed bag. A handful of CI/CD providers spoof a TTY to get nicer color output. I don't think we can reliably go that route.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A handful of CI/CD providers spoof a TTY to get nicer color output. I don't think we can reliably go that route.

If the issue is false positive detection of interactive invocations, introducing delays to those false positives seems ok to me

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2021
@mkimuram
Copy link
Contributor

Could you check this comment and give feedback?
I think that this KEP shares some use cases and motivations with KEP-2839 (In-use protection KEP), so I would appreciate it if the people involved to this KEP also feedback to KEP-2839.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 22, 2021
@MadhavJivrajani
Copy link
Contributor

/remove-lifecycle rotten
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 26, 2021
@k8s-ci-robot
Copy link
Contributor

@MadhavJivrajani: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eddiezane
Copy link
Member Author

Going back to the drawing board on this.

@ardaguclu
Copy link
Member

ardaguclu commented Mar 2, 2023

For the people interested on this KEP, there was an another attempt in this proposed KEP #3896. That would be great, if you can give your insights on it too. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet