[RFC] Protecting users of kubectl delete

1,776 views
Skip to first unread message

Eddie Zaneski

unread,
May 27, 2021, 3:35:23 PM5/27/21
to kuberne...@googlegroups.com, kubernete...@googlegroups.com

Hi Kuberfriendos,


We wanted to start a discussion about mitigating some of the potential footguns in kubectl.


Over the years we've heard stories from users who accidentally deleted resources in their clusters. This trend seems to be rising lately as newer folks venture into the Kubernetes/DevOps/Infra world.


First some background.


When a namespace is deleted it also deletes all of the resources under it. The deletion runs without further confirmation, and can be devastating if accidentally run against the wrong namespace (e.g. thanks to hasty tab completion use).


```

kubectl delete namespace prod-backup

```


When all namespaces are deleted essentially all resources are deleted. This deletion is trivial to do with the `--all` flag, and it also runs without further confirmation. It can effectively wipe out a whole cluster.


```

kubectl delete namespace --all

```


The difference between `--all` and `--all-namespaces` can be confusing.


There are certainly things cluster operators should be doing to help prevent this user error (like locking down permissions) but we'd like to explore what we can do to help end users as maintainers.


There are a few changes we'd like to propose to start discussion. We plan to introduce this as a KEP but wanted to gather early thoughts.


Change 1: Require confirmation when deleting with --all and --all-namespaces


Confirmation when deleting with `--all` and `--all-namespaces` is a long requested feature but we've historically determined this to be a breaking change and declined to implement. Existing scripts would require modification or break. While it is indeed breaking, we believe this change is necessary to protect users.


We propose moving towards requiring confirmation for deleting resources with `--all` and `--all-namespaces` over 3 releases (1 year). This gives us ample time to warn users and communicate the change through blogs and release notes.

  • Alpha

    • Introduce a flag like `--ask-for-confirmation | -i` that requires confirmation when deleting ANY resource. For example the `rm` command to delete files on a machine has this built in with `-i`. This provides a temporary safety mechanism for users to start using now.

    • Add a flag to enforce the current behavior and skip confirmation. `--force` is already used for removing stuck resources (see change 3 below) so we may want to use `--auto-approve` (inspired by Terraform). Usage of `--ask-for-confirmation` will always take precedence and ignore `--auto-approve`. We can see this behavior with `rm -rfi`.

 -i          Request confirmation before attempting to remove each file, regardless of the file's permissions, or whether or not the standard input device is a terminal.  The -i option overrides any previous -f options.

    • Begin warning to stderr that by version x.x.x deleting with `--all` and `--all-namespaces` will require interactive confirmation or the `--auto-approve` flag.

    • Introduce a 10 second sleep when deleting with `--all` or `--all-namespaces` before proceeding to give the user a chance to react to the warning and interrupt their command.

  • Beta

    • Address user feedback from alpha.

  • GA

    • Deleting with `--all` or `--all-namespaces` now requires interactive confirmation as the default unless `--auto-approve` is passed.

    • Remove the 10-second deletion delay introduced in the alpha, and stop printing the deletion warning when interactive mode is disabled.


Change 2: Throw an error when --namespace provided to cluster-scoped resource deletion


Since namespaces are a cluster resource using the `--namespace | -n` flag when deleting them should error. This flag has no effect on cluster resources and confuses users. We believe this to be an implementation bug that should be fixed for cluster scoped resources. Although it is true that this may break scripts that are incorrectly including the flag on intentional mass deletion operations, the inconvenience to those users of removing the misused flag must be weighed against the material harm this implementation mistake is currently causing to other users in production. This will follow a similar rollout to above.


Change 3: Rename related flags that commonly cause confusion


The `--all` flag should be renamed to `--all-instances`. This makes it entirely clear which "all" it refers to. This would follow a 3-release rollout as well, starting with the new flag and warning about deprecation.


The `--force` flag is also a frequent source of confusion, and users do not understand what exactly is being forced. Alongside the `--all` change (in the same releases), we should consider renaming `--force` to something like `--force-reference-removal`.


These are breaking changes that shouldn't be taken lightly. Scripts, docs, and applications will all need to be modified. Putting on our empathy hats we believe that the benefits and protections to users are worth the hassle. We will do all we can to inform users of these impending changes and follow our standard guidelines for deprecating a flag.


Please see the following for examples of users requesting or running into this. This is a sample from a 5 minute search.


From GitHub:

From StackOverflow:

Eddie Zaneski - on behalf of SIG CLI

Tim Hockin

unread,
May 27, 2021, 3:47:41 PM5/27/21
to Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
On Thu, May 27, 2021 at 12:35 PM Eddie Zaneski <eddi...@gmail.com> wrote:

Hi Kuberfriendos,


We wanted to start a discussion about mitigating some of the potential footguns in kubectl.


Over the years we've heard stories from users who accidentally deleted resources in their clusters. This trend seems to be rising lately as newer folks venture into the Kubernetes/DevOps/Infra world.


First some background.


When a namespace is deleted it also deletes all of the resources under it. The deletion runs without further confirmation, and can be devastating if accidentally run against the wrong namespace (e.g. thanks to hasty tab completion use).


```

kubectl delete namespace prod-backup

```


When all namespaces are deleted essentially all resources are deleted. This deletion is trivial to do with the `--all` flag, and it also runs without further confirmation. It can effectively wipe out a whole cluster.


```

kubectl delete namespace --all

```


The difference between `--all` and `--all-namespaces` can be confusing.


There are certainly things cluster operators should be doing to help prevent this user error (like locking down permissions) but we'd like to explore what we can do to help end users as maintainers.


There are a few changes we'd like to propose to start discussion. We plan to introduce this as a KEP but wanted to gather early thoughts.


Change 1: Require confirmation when deleting with --all and --all-namespaces


Confirmation when deleting with `--all` and `--all-namespaces` is a long requested feature but we've historically determined this to be a breaking change and declined to implement. Existing scripts would require modification or break. While it is indeed breaking, we believe this change is necessary to protect users.


We propose moving towards requiring confirmation for deleting resources with `--all` and `--all-namespaces` over 3 releases (1 year). This gives us ample time to warn users and communicate the change through blogs and release notes.


Can we start with a request for confirmation when the command is run interactively and a printed warning (and maybe the sleep). 
 

Change 2: Throw an error when --namespace provided to cluster-scoped resource deletion


Since namespaces are a cluster resource using the `--namespace | -n` flag when deleting them should error. This flag has no effect on cluster resources and confuses users. We believe this to be an implementation bug that should be fixed for cluster scoped resources. Although it is true that this may break scripts that are incorrectly including the flag on intentional mass deletion operations, the inconvenience to those users of removing the misused flag must be weighed against the material harm this implementation mistake is currently causing to other users in production. This will follow a similar rollout to above.


The "material harm" here feels very low and I am not convinced it rises to the level of breaking users. 
 

Change 3: Rename related flags that commonly cause confusion


The `--all` flag should be renamed to `--all-instances`. This makes it entirely clear which "all" it refers to. This would follow a 3-release rollout as well, starting with the new flag and warning about deprecation.


I think 3 releases is too aggressive to break users.  We know that it takes months or quarters for releases to propagate into providers' stable-channels.  In the meantime, docs and examples all over the internet will be wrong.

If we're to undertake any such change I think it needs to be more gradual.  Consider 6 to 9 releases instead.  Start by adding new forms and warning on use of the old forms.  Then add small sleeps to the deprecated forms.  Then make the sleeps longer and the warnings louder.  By the time it starts hurting people there will be ample information all over the internet about how to fix it.  Even then, the old commands will still work (even if slowly) for a long time.  And in fact, maybe we should leave it in that state permanently.  Don't break users, just annoy them.
 
Tim

Brian Topping

unread,
May 27, 2021, 3:54:40 PM5/27/21
to Eddie Zaneski, kuberne...@googlegroups.com, kubernete...@googlegroups.com
Please also consider this issue

There are good examples of solving this issue in Rook and Gardener. My personal preference from those projects is requiring an annotation to be placed on critical resources before any deletion workflow is allowed to start. If these annotation requirements could be defined declaratively, projects and users could create the constraints on installation. As well, the constraints could be removed if they became onerous in dev / test environments. 

Creating basic safeguards not just about junior users, I have deleted massive amounts of infrastructure several times because I was in the wrong kubectl context. I can’t judge whether I am an idiot or not.

I have started making windows with critical resources present to have obnoxious backgrounds that are unmistakeable. Another idea in this genre is for better kubectl support for PS1 resources. Contexts and/or namespaces could contain API resources with PS1 sequences that are played when a context is activated. Again, this would be easily modified or removed when they aren’t desired.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmx-a6qLr_%3D74Mv%2B%2Bp5rJJkPA%3Dk8vtFNTKs5LY1xB4x_Xw%40mail.gmail.com.

Tim Hockin

unread,
May 27, 2021, 3:58:58 PM5/27/21
to Brian Topping, Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
default context is a good point.

I'd like a way to set my kubeconfig to not have defaults, and to REQUIRE me to specify --context or --cluster and --namespace.  I have absolutely flubbed this many times.

Jordan Liggitt

unread,
May 27, 2021, 3:59:43 PM5/27/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
I appreciate the desire to help protect users, but I agree with Tim that rollouts take way longer than you expect, and that the bar for breaking existing users that are successful is very high.

The project's deprecation periods are the minimum required. For the core options of the core commands of a tool like kubectl which is used as a building block, I don't think we should ever break compatibility if we can possibly avoid it.


On Thu, May 27, 2021 at 3:47 PM 'Tim Hockin' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
On Thu, May 27, 2021 at 12:35 PM Eddie Zaneski <eddi...@gmail.com> wrote:

Change 1: Require confirmation when deleting with --all and --all-namespaces


Confirmation when deleting with `--all` and `--all-namespaces` is a long requested feature but we've historically determined this to be a breaking change and declined to implement. Existing scripts would require modification or break. While it is indeed breaking, we believe this change is necessary to protect users.


We propose moving towards requiring confirmation for deleting resources with `--all` and `--all-namespaces` over 3 releases (1 year). This gives us ample time to warn users and communicate the change through blogs and release notes.


Can we start with a request for confirmation when the command is run interactively and a printed warning (and maybe the sleep). 

+1 for limiting behavior changes to interactive runs, and starting with warnings and maybe sleeps.
 
 

Change 2: Throw an error when --namespace provided to cluster-scoped resource deletion


Since namespaces are a cluster resource using the `--namespace | -n` flag when deleting them should error. This flag has no effect on cluster resources and confuses users. We believe this to be an implementation bug that should be fixed for cluster scoped resources. Although it is true that this may break scripts that are incorrectly including the flag on intentional mass deletion operations, the inconvenience to those users of removing the misused flag must be weighed against the material harm this implementation mistake is currently causing to other users in production. This will follow a similar rollout to above.


The "material harm" here feels very low and I am not convinced it rises to the level of breaking users. 

Setting the namespace context of an invocation is equivalent to putting a default namespace in your kubeconfig file. I don't think we should break compatibility with this option. It is likely to disrupt tools that wrap kubectl and set common options on all kubectl invocations.

 
 

Change 3: Rename related flags that commonly cause confusion


The `--all` flag should be renamed to `--all-instances`. This makes it entirely clear which "all" it refers to. This would follow a 3-release rollout as well, starting with the new flag and warning about deprecation.


I think 3 releases is too aggressive to break users.  We know that it takes months or quarters for releases to propagate into providers' stable-channels.  In the meantime, docs and examples all over the internet will be wrong.

If we're to undertake any such change I think it needs to be more gradual.  Consider 6 to 9 releases instead.  Start by adding new forms and warning on use of the old forms.  Then add small sleeps to the deprecated forms.  Then make the sleeps longer and the warnings louder.  By the time it starts hurting people there will be ample information all over the internet about how to fix it.  Even then, the old commands will still work (even if slowly) for a long time.  And in fact, maybe we should leave it in that state permanently.  Don't break users, just annoy them.

If we wanted to add parallel flag names controlling the same variables and hide the old flags, that could be ok, but we should never remove the old flags. Even adding parallel flags means the ecosystem gets fragmented between scripts written against the latest kubectl and ones written using previous flags.

Clayton Coleman

unread,
May 27, 2021, 4:07:02 PM5/27/21
to Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
This is somewhat terrifying to me from a backward compatibility perspective.  We have never changed important flags like this, and we have in fact explicitly stated we should not.  I might almost argue that if we were to do this, we'd create a new CLI that has different flags
 



These are breaking changes that shouldn't be taken lightly. Scripts, docs, and applications will all need to be modified. Putting on our empathy hats we believe that the benefits and protections to users are worth the hassle. We will do all we can to inform users of these impending changes and follow our standard guidelines for deprecating a flag.


Please see the following for examples of users requesting or running into this. This is a sample from a 5 minute search.


From GitHub:

From StackOverflow:

Eddie Zaneski - on behalf of SIG CLI

--

Brendan Burns

unread,
May 27, 2021, 4:21:44 PM5/27/21
to ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan



From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Clayton Coleman <ccol...@redhat.com>
Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddi...@gmail.com>
Cc: kubernetes-dev <kuberne...@googlegroups.com>; kubernetes-sig-cli <kubernete...@googlegroups.com>
Subject: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete
 

Jordan Liggitt

unread,
May 27, 2021, 4:23:08 PM5/27/21
to Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I like the "opt into deletion protection" approach. That got discussed a long time ago (e.g. https://github.com/kubernetes/kubernetes/pull/17740#issuecomment-217461024), but didn't get turned into a proposal/implementation

There's a variety of ways that could be done... server-side and enforced, client-side as a hint, etc.

Daniel Smith

unread,
May 27, 2021, 5:11:24 PM5/27/21
to Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I'm in favor of server-side enforced deletion protection, however it's not clear how that will protect a single "locked" item in a namespace if someone deletes the entire namespace.

The last deletion protection mechanism conversation that comes to mind got bogged down in, well what if multiple actors all want to lock an object, how do you know that they have all unlocked it? I can imagine a mechanism like Finalizers (Brian suggested this--"liens"), but I'm not convinced the extra complexity (and implied delay agreeing on / building something) is worth it.

I think I disagree with all those who don't want to make kubectl safer for fear of breaking users, because I think there's probably some middle ground, e.g. I can imagine something like: detect if a TTY is present; if so, give warnings / make them confirm destructive action; otherwise, assume it's a script that's already been tested and just execute it.



You received this message because you are subscribed to the Google Groups "kubernetes-sig-cli" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-cli/CAMBP-pJwPtDk0Zz6XqTo4XFKox8k3RsfQ2b%2B-rLR%2BeeDrTKG4Q%40mail.gmail.com.

Benjamin Elder

unread,
May 27, 2021, 5:20:04 PM5/27/21
to Daniel Smith, Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
This is perhaps veering a bit off topic but detecting actual interactivity can be tricky FWIW ... E.G. operating in travis-ci you will detect a TTY as it's intentionally trying to get output from tools that match developer's terminals.

https://github.com/kubernetes-sigs/kind/pull/1479/files
https://github.com/travis-ci/travis-ci/issues/8193
https://github.com/travis-ci/travis-ci/issues/1337

I wouldn't recommend the TTY detection route in particular.

Antonio Ojea

unread,
May 27, 2021, 6:07:57 PM5/27/21
to Benjamin Elder, Daniel Smith, Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli

raghvenders raghvenders

unread,
May 28, 2021, 12:12:56 AM5/28/21
to Antonio Ojea, Benjamin Elder, Brendan Burns, Daniel Smith, Eddie Zaneski, Jordan Liggitt, ccoleman, kubernetes-dev, kubernetes-sig-cli
Is it worth considering like old school substitution if archive for delete so can be restored then or the same could be achieved through scheduled eviction process for delete ?

Regards,
Raghvender

abhishek....@gmail.com

unread,
May 28, 2021, 6:56:53 AM5/28/21
to Kubernetes developer/contributor discussion
My +1 to this proposal.

As much as we care about giving utility to all users, it is also a basic need to provide some cover from accidental disasters. RBAC is a very wide topic and I understand kubernetes administrators has responsibility to restrict access.
At the same time there are cases where a cluster is very big with many applications on it and admin access to a namespace has to be given to different people to ease some work. At last we are all human, one "--all" or "-A| --all-namespaces" is just needed to put down a otherwise running cluster with a 'delete' call.
I would say, it is very possible for anyone to make such mistake but the payment must not be whole cluster going down.
That's the same reason "rm" in Linux has an "--interactive|-i" feature because any level of experts sometimes even make such mistakes.
I am in total favor of having something like "--interactive|-i" or "--ask-for-confirmation" in place as Alpha feature with warning at first, and then slowly graduate it to GA. That would give every one a lot of time to change any breaking automation scripts. 

Siyu Wang

unread,
May 28, 2021, 6:56:53 AM5/28/21
to Eddie Zaneski, kuberne...@googlegroups.com, kubernete...@googlegroups.com
Hi, you may look at the OpenKruise project. The latest v0.9.0 version has provided a feature called Deletion Protection, which can not only protect the namespaces from cascading deletion, but also for other resources like workloads and CRD.

The defense by webhook can also protect deletion operations from kubectl or any other api sources.



Rory McCune

unread,
May 28, 2021, 6:56:53 AM5/28/21
to Kubernetes developer/contributor discussion
Hi All, 

Looking at this, and seeing that making changes to the operation of kubectl will take a while, would it make sense to start with some more guidance for cluster operators around least privilege RBAC designs and using things like impersonation to reduce the risk of mistakes being made?

If I relate this back to other setups like Windows domain admin, standard good practice is for them not to use their domain admin account for day to day administration but to have a separate account to use where destructive actions are needed. Then of course in Linux we have sudo.

If cluster operators made use of read-only accounts for standard troubleshooting and then had impersonation rights to an account with deletion rights, it may reduce the likelihood of accidents happening as an additional switch would need to be provided.

Kind Regards

Rory

Douglas Schilling Landgraf

unread,
May 28, 2021, 7:48:57 AM5/28/21
to Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Thu, May 27, 2021 at 4:23 PM 'Jordan Liggitt' via
kubernetes-sig-cli <kubernete...@googlegroups.com> wrote:
>
> I like the "opt into deletion protection" approach. That got discussed a long time ago (e.g. https://github.com/kubernetes/kubernetes/pull/17740#issuecomment-217461024), but didn't get turned into a proposal/implementation
>

+1 Recently I talked with a coworker looking for such a feature.

> There's a variety of ways that could be done... server-side and enforced, client-side as a hint, etc.
>
> On Thu, May 27, 2021 at 4:21 PM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
>>
>> I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.
>>
>> We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also
> consider "k8s.io/lock" which actively blocks the delete.

Annotation seems a pretty straightforward approach IMO if such a
feature was enabled in the cluster by the user.
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-cli" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-cli/CAMBP-pJwPtDk0Zz6XqTo4XFKox8k3RsfQ2b%2B-rLR%2BeeDrTKG4Q%40mail.gmail.com.

Tim Hockin

unread,
May 28, 2021, 10:31:16 AM5/28/21
to Douglas Schilling Landgraf, Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
There are are lots of good ideas here.  I look forward to a solution that takes the best parts of each of them :)

Zizon Qiu

unread,
May 28, 2021, 10:58:23 AM5/28/21
to Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.
 
Or abuse the existing finalizer mechanism.  

Daniel Smith

unread,
May 28, 2021, 11:35:14 AM5/28/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
Finalizers prevent a deletion from finishing, not from starting.

Tim Hockin

unread,
May 28, 2021, 12:14:40 PM5/28/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Fri, May 28, 2021 at 7:58 AM Zizon Qiu <zzd...@gmail.com> wrote:
On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.
 
Or abuse the existing finalizer mechanism.  

Finalizers are not "deletion inhibitors" just "deletion delayers".  Once you delete, the finalizer might stop it from happening YET but it *is* going to happen.  I'd rather see a notion of opt-in delete-inhibit.  It is not clear to me what happens if I have a delete-inhibit on something inside a namespace and then try to delete the namespace - we don't have transactions, so we can't abort the whole thing - it would be stuck in a weird partially-deleted state and I expect that to be a never-ending series of bug reports.

 

Tim Hockin

unread,
May 28, 2021, 12:55:32 PM5/28/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli


On Fri, May 28, 2021 at 9:21 AM Zizon Qiu <zzd...@gmail.com> wrote:
I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer).
And keeping alive whenever counter > 0(with any arbitrary finalizer).

That's correct, but there's a fundamental difference between "alive" and "waiting to die".  A delete operation moves an object, irrevocably from "alive" to "waiting to die".  That is a visible "state" (the deletionTimestamp is set) and there's no way to come back from it.  Let's not abuse that to mean something else.

Tabitha Sable

unread,
May 28, 2021, 1:54:48 PM5/28/21
to Rory McCune, Kubernetes developer/contributor discussion
I really love this suggestion, Rory. I've heard it come up in other contexts before and I think it's really smart.

WDYT about taking this idea to our friends at sig-security-docs?

Tabitha

Abhishek Tamrakar

unread,
May 29, 2021, 1:13:41 AM5/29/21
to Tim Hockin, Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
The current deletion strategy provides is easy but very risky without any gates, the deletion could risk whole cluster, this is where it needs some cover.
The reason I would still prefer the client-side approach as mentioned in the original proposal is because the decision of deletion of a certain object or objects should remain in control of the end user at the same time providing the safest for them to operate the cluster.


You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAO_RewaP8-96m-Tjg4wQ6Gv0yTgL2EeDpmQNaZUK8-BdwM1s7g%40mail.gmail.com.

Zizon Qiu

unread,
May 29, 2021, 1:13:46 AM5/29/21
to Tim Hockin, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer).
And keeping alive whenever counter > 0(with any arbitrary finalizer).

raghvenders raghvenders

unread,
May 29, 2021, 1:13:50 AM5/29/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
Or Finalizing through consensus :)

raghvenders raghvenders

unread,
Jun 1, 2021, 11:25:25 AM6/1/21
to Abhishek Tamrakar, Tim Hockin, Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
Since client-side changes would potentially go about  6-9 releases as mentioned by Tim and potentially breaking changes, a server-side solution would be a reasonable and worthy option to consider and finalize.

Quickly Summarizing the options discussed so far (Server-side):
  • Annotation and Delete Prohibitors
  • Finalizers
  • RBAC and Domain accounts/sudo-like
Please add, if I missed anything or correct me if it is not the option.

And parallelly continuing with proposed Kubectl client-based changes - Change 1 (Interactive), Change 2, and 3 for the targeted release timelines.


I would be curious to see how will it be like, choosing 1 of 3 options or combine the options, then a WBS and stakeholder approvals, components changes, and release rollouts?

Regards,
Raghvender


Josh Berkus

unread,
Jun 1, 2021, 12:26:34 PM6/1/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
On 5/27/21 12:47 PM, 'Tim Hockin' via Kubernetes developer/contributor
discussion wrote:
>
> If we're to undertake any such change I think it needs to be more
> gradual.  Consider 6 to 9 releases instead.  Start by adding new forms
> and warning on use of the old forms.  Then add small sleeps to the
> deprecated forms.  Then make the sleeps longer and the warnings louder.
> By the time it starts hurting people there will be ample information all
> over the internet about how to fix it.  Even then, the old commands will
> still work (even if slowly) for a long time.  And in fact, maybe we
> should leave it in that state permanently.  Don't break users, just
> annoy them.

My experience is that taking more releases to roll out a breaking change
doesn't really make any difference ... users just ignore the change
until it goes GA, regardless.

Also consider that releases are currently 4 months, so 6 to 9 releases
means 2 to 3 years.

What I would rather see here is a switch that supports the old behavior
in the kubectl config. Then deprecate that over 3 releases or so. So:

Alpha: feature gate
Beta: feature gate, add config switch (on if not set)
GA: on by default, config switch (off if not set)
GA +3: drop config switch -- or not?

... although, now that I think about it, is it *ever* necessary to drop
the config switch? As a scriptwriter, I prefer things I can put into my
.kube config to new switches.

Also, of vital importance here is: how many current popular CI/CD
platforms rely on automated namespace deletion? If the answer is
"several" then that's gonna slow down rollout.

--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO

Tim Hockin

unread,
Jun 1, 2021, 12:52:52 PM6/1/21
to raghvenders raghvenders, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Tue, Jun 1, 2021 at 8:20 AM raghvenders raghvenders
<raghv...@gmail.com> wrote:
>
> Since client-side changes would potentially go about 6-9 releases as mentioned by Tim and potentially breaking changes, a server-side solution would be a reasonable and worthy option to consider and finalize.

To be clear - the distinction isn't really client vs. server. It's
about breaking changes without users EXPLICITLY opting in. You REALLY
can't make something that used to work suddenly stop working, whether
that is client or server implemented.

On the contrary, client-side changes like "ask for confirmation" and
"print stuff in color" are easier because they can distinguish between
interactive and non-interactive execution.

Adding a confirmation to interactive commands should not require any
particular delays in rollout.

Eddie Zaneski

unread,
Jun 1, 2021, 4:58:02 PM6/1/21
to kubernetes-dev, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, raghvenders raghvenders, Tim Hockin, kubernetes-sig-cli
Thanks to everyone for the great thoughts and discussion so far!

There are some good ideas throughout this thread (please keep them coming) that could probably stand alone as KEPs. I believe anything opt-in/server-side is orthogonal to what we're currently trying to achieve.

I think the big takeaway so far is that the flag and error changes should be separated from the warning/delay/confirmation changes.

We're thinking in the context of an imperative CLI that takes user input and executes administrative actions. Users don't intend to delete the resources they are accidentally deleting - not that there are things that should never be deleted. It doesn't matter how many mistakes have to pile up to create a perfect storm of a bad thing because we're allowing a bad thing to happen without a confirmation gate.

With confirmation in place we significantly lower the chances of accidentally deleting everything in your cluster. This will most likely be the scope of our starting point.

If you want to join us for more we will be discussing during the SIG-CLI call tomorrow (Wednesday 9am PT).


Eddie Zaneski


On Tue, Jun 01, 2021 at 10:52 AM, Tim Hockin <tho...@google.com> wrote:

On Tue, Jun 1, 2021 at 8:20 AM raghvenders raghvenders
<raghvenders@gmail.com> wrote:

Since client-side changes would potentially go about 6-9 releases as mentioned by Tim and potentially breaking changes, a server-side solution would be a reasonable and worthy option to consider and finalize.

To be clear - the distinction isn't really client vs. server. It's about breaking changes without users EXPLICITLY opting in. You REALLY can't make something that used to work suddenly stop working, whether that is client or server implemented.

On the contrary, client-side changes like "ask for confirmation" and
"print stuff in color" are easier because they can distinguish between interactive and non-interactive execution.

Adding a confirmation to interactive commands should not require any particular delays in rollout.

Quickly Summarizing the options discussed so far (Server-side):

Annotation and Delete Prohibitors
Finalizers
RBAC and Domain accounts/sudo-like

Please add, if I missed anything or correct me if it is not the option.

And parallelly continuing with proposed Kubectl client-based changes - Change 1 (Interactive), Change 2, and 3 for the targeted release timelines.

I would be curious to see how will it be like, choosing 1 of 3 options or combine the options, then a WBS and stakeholder approvals, components changes, and release rollouts?

Regards,
Raghvender

On Sat, May 29, 2021 at 12:13 AM Abhishek Tamrakar <abhishek.tamrakar08@gmail.com> wrote:

The current deletion strategy provides is easy but very risky without any gates, the deletion could risk whole cluster, this is where it needs some cover. The reason I would still prefer the client-side approach as mentioned in the original proposal is because the decision of deletion of a certain object or objects should remain in control of the end user at the same time providing the safest for them to operate the cluster.

On Fri, May 28, 2021, 22:25 'Tim Hockin' via Kubernetes developer/contributor discussion <kubernetes-dev@googlegroups.com> wrote:

On Fri, May 28, 2021 at 9:21 AM Zizon Qiu <zzdtsv@gmail.com> wrote:

I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer). And keeping alive whenever counter > 0(with any arbitrary finalizer).

That's correct, but there's a fundamental difference between "alive" and "waiting to die". A delete operation moves an object, irrevocably from "alive" to "waiting to die". That is a visible "state" (the deletionTimestamp is set) and there's no way to come back from it. Let's not abuse that to mean something else.

On Sat, May 29, 2021 at 12:14 AM Tim Hockin <thockin@google.com> wrote:

On Fri, May 28, 2021 at 7:58 AM Zizon Qiu <zzdtsv@gmail.com> wrote:

On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kubernetes-dev@googlegroups.com> wrote:

I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

Or abuse the existing finalizer mechanism.

Finalizers are not "deletion inhibitors" just "deletion delayers". Once you delete, the finalizer might stop it from happening YET but it *is* going to happen. I'd rather see a notion of opt-in delete-inhibit. It is not clear to me what happens if I have a delete-inhibit on something inside a namespace and then try to delete the namespace - we don't have transactions, so we can't abort the whole thing - it would be stuck in a weird partially-deleted state and I expect that to be a never-ending series of bug reports.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan

________________________________
From: kubernetes-dev@googlegroups.com <kubernetes-dev@googlegroups.com> on behalf of Clayton Coleman <ccoleman@redhat.com> Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddiezane@gmail.com>
Cc: kubernetes-dev <kubernetes-dev@googlegroups.com>; kubernetes-sig-cli <kubernetes-sig-cli@googlegroups.com> Subject: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmx-a6qLr_%3D74Mv%2B%2Bp5rJJkPA%3Dk8vtFNTKs5LY1xB4x_Xw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKfeUTY2L8dq%2BZr0Eagun_AUtOmpC7sExuuvC8OTZ6YSw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/SA0PR21MB2011CEA6073A236826EC84C3DB239%40SA0PR21MB2011.namprd21.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAKTRiEK%3Dbu6HQMT9xZ8PCvhQxJT5AX5WsFO_EkkucS%2Btbf4UBA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe. To unsubscribe from this group and all its topics, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAO_RewaP8-96m-Tjg4wQ6Gv0yTgL2EeDpmQNaZUK8-BdwM1s7g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAGBZAhGEUQ9bd0zcbt3aB2Z7Z958Wqv9Qx7iHwUfnnWWdHvGkA%40mail.gmail.com.


Brian Topping

unread,
Jun 1, 2021, 5:48:40 PM6/1/21
to Eddie Zaneski, kubernetes-dev, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, raghvenders raghvenders, Tim Hockin, kubernetes-sig-cli
An especially dangerous situation is one where Ceph storage is managed by Rook. Rook itself is incredibly reliable, but hostStorage is used for the critical Placement Group (PG) maps on monitor nodes (stored in RocksDB). Loss of PG maps would result in loss of *all* PV data in the storage cluster! 

IMO this is more critical than loss of the API object store – assuming they are both backed up, restoring etcd and waiting for reconciliation is several orders of magnitude less downtime than restoring TB/PB/EB of distributed storage. Some resilient application architectures are designed not to need backup, but cannot tolerate a complete storage failure. 

Raising this observation in case it’s worth considering hierarchical confirmation gates with something basic like reference counting. It should be *even harder* to delete PV storage providers, cluster providers or other items that have multiple dependencies.

Maybe this indicates a “deletion provider interface” for pluggable tools. Default no-op implementations echo existing behavior, advanced implementations might be installed with Helm, use LDAP for decision processing and automatically archive deleted content. Let the community build these implementations instead of trying to crystal ball the best semantics. This also pushes tooling responsibility out to deployers. 

$0.02...

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmwa0zBn1pjvfArcPezj%2B1AupLhwENKpgf1rrQL6p5Nocw%40mail.gmail.com.

fillz...@gmail.com

unread,
Jun 2, 2021, 1:18:54 AM6/2/21
to Kubernetes developer/contributor discussion
Server-side solution is reasonable. However, finalizers can just protect the resource to be deleted in etcd, but the resources belongs to it will still be deleted.
Webhook might be a better way to extend this.

OpenKruise, one of the CNCF sandbox projects, has already provides the Protection for Cascading Deletion.



Fury kerry

unread,
Jun 2, 2021, 1:19:06 AM6/2/21
to Eddie Zaneski, kubernetes-dev, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, raghvenders raghvenders, Tim Hockin, kubernetes-sig-cli
server side deletion protections are already implemented in OpenKruise (https://openkruise.io/en-us/docs/deletion_protection.html), which cover both namespace and workload cascade detention. 

On Sat, May 29, 2021 at 12:13 AM Abhishek Tamrakar <abhishek....@gmail.com> wrote:

The current deletion strategy provides is easy but very risky without any gates, the deletion could risk whole cluster, this is where it needs some cover. The reason I would still prefer the client-side approach as mentioned in the original proposal is because the decision of deletion of a certain object or objects should remain in control of the end user at the same time providing the safest for them to operate the cluster.

On Fri, May 28, 2021, 22:25 'Tim Hockin' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:

On Fri, May 28, 2021 at 9:21 AM Zizon Qiu <zzd...@gmail.com> wrote:

I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer). And keeping alive whenever counter > 0(with any arbitrary finalizer).

That's correct, but there's a fundamental difference between "alive" and "waiting to die". A delete operation moves an object, irrevocably from "alive" to "waiting to die". That is a visible "state" (the deletionTimestamp is set) and there's no way to come back from it. Let's not abuse that to mean something else.

On Sat, May 29, 2021 at 12:14 AM Tim Hockin <tho...@google.com> wrote:

On Fri, May 28, 2021 at 7:58 AM Zizon Qiu <zzd...@gmail.com> wrote:

On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:

I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

Or abuse the existing finalizer mechanism.

Finalizers are not "deletion inhibitors" just "deletion delayers". Once you delete, the finalizer might stop it from happening YET but it *is* going to happen. I'd rather see a notion of opt-in delete-inhibit. It is not clear to me what happens if I have a delete-inhibit on something inside a namespace and then try to delete the namespace - we don't have transactions, so we can't abort the whole thing - it would be stuck in a weird partially-deleted state and I expect that to be a never-ending series of bug reports.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan

________________________________

From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Clayton Coleman <ccol...@redhat.com> Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddi...@gmail.com>

Cc: kubernetes-dev <kuberne...@googlegroups.com>; kubernetes-sig-cli <kubernete...@googlegroups.com> Subject: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmx-a6qLr_%3D74Mv%2B%2Bp5rJJkPA%3Dk8vtFNTKs5LY1xB4x_Xw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKfeUTY2L8dq%2BZr0Eagun_AUtOmpC7sExuuvC8OTZ6YSw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/SA0PR21MB2011CEA6073A236826EC84C3DB239%40SA0PR21MB2011.namprd21.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAKTRiEK%3Dbu6HQMT9xZ8PCvhQxJT5AX5WsFO_EkkucS%2Btbf4UBA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe. To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAO_RewaP8-96m-Tjg4wQ6Gv0yTgL2EeDpmQNaZUK8-BdwM1s7g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAGBZAhGEUQ9bd0zcbt3aB2Z7Z958Wqv9Qx7iHwUfnnWWdHvGkA%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmwa0zBn1pjvfArcPezj%2B1AupLhwENKpgf1rrQL6p5Nocw%40mail.gmail.com.


--
Please consider the environment before you print this mail
Zhen Zhang
Zhejiang University
Yuquan Campus
MSN:Fury_...@hotmail.com

Eddie Zaneski

unread,
Jun 2, 2021, 5:02:43 PM6/2/21
to Kubernetes developer/contributor discussion
Thanks to everyone who joined the call today and provided valuable input!

If you'd like to watch the recording you can find it here.

In summary we want to balance protecting users with breaking users. We will propose KEP 2775 to add two coupled changes:
  • Add a new `--interactive | -i` flag to kubectl delete that will require confirmation before deleting resources. This flag will be false by default.
  • `kubectl delete [--all | --all-namespaces]` will warn about the destructive action that will be performed and artificially delay for x seconds allowing users a chance to abort.
These changes are not breaking and immediately provide users a way to mitigate accidental deletions.

An opt-in mechanism to default to the new interactive behavior through user config files will be a fast follow.

Once these measures are in place we will re-visit and address community feedback.

Tim Hockin

unread,
Jun 2, 2021, 7:15:35 PM6/2/21
to Eddie Zaneski, Kubernetes developer/contributor discussion
Was the idea of demanding interactive confirmation when the command is
executed interactively discarded?
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/60247c7d-ed34-4698-a947-2b3bf72eb98cn%40googlegroups.com.

Tim Hockin

unread,
Jun 2, 2021, 8:08:55 PM6/2/21
to Eddie Zaneski, Kubernetes developer/contributor discussion
Rephrasing for clarity:

Did we discard the idea of demanding interactive confirmation when a
"dangerous" command is executed in an interactive session? If so,
why? To me, that seems like the most approachable first vector and
likely to get a good return on investment.

Eddie Zaneski

unread,
Jun 3, 2021, 1:40:57 AM6/3/21
to Kubernetes developer/contributor discussion
On Wednesday, June 2, 2021 at 6:08:55 PM UTC-6 Tim Hockin wrote:
Rephrasing for clarity:

Did we discard the idea of demanding interactive confirmation when a
"dangerous" command is executed in an interactive session? If so,
why? To me, that seems like the most approachable first vector and
likely to get a good return on investment.

Wasn't discarded. I'll include in the KEP and we can dig in for more cases of platforms doing funny things with spoofing TTY's.

Eddie Zaneski

unread,
Jun 3, 2021, 7:32:28 PM6/3/21
to Kubernetes developer/contributor discussion
On Wednesday, June 2, 2021 at 11:40:57 PM UTC-6 Eddie Zaneski wrote:
On Wednesday, June 2, 2021 at 6:08:55 PM UTC-6 Tim Hockin wrote:
Rephrasing for clarity:

Did we discard the idea of demanding interactive confirmation when a
"dangerous" command is executed in an interactive session? If so,
why? To me, that seems like the most approachable first vector and
likely to get a good return on investment.

Wasn't discarded. I'll include in the KEP and we can dig in for more cases of platforms doing funny things with spoofing TTY's.

I've done a bit more digging into Ben's comments about TTY detection and think we may need to discard that route.

TravisCI is one provider we know of spoofing a TTY to trick tools into outputting things like color and status bars. It looks like CircleCI may do this as well.

kind hardcoded some vendor specific environment variables to get around this with Travis but I don't think we can/want to do that for all the vendors.

If we can't reliably detect TTY's inside these pipelines we will indeed break scripts.

Thoughts?

Tim Hockin

unread,
Jun 3, 2021, 8:11:39 PM6/3/21
to Eddie Zaneski, Kubernetes developer/contributor discussion
On Thu, Jun 3, 2021 at 4:32 PM Eddie Zaneski <eddi...@gmail.com> wrote:
>
> On Wednesday, June 2, 2021 at 11:40:57 PM UTC-6 Eddie Zaneski wrote:
>>
>> On Wednesday, June 2, 2021 at 6:08:55 PM UTC-6 Tim Hockin wrote:
>>>
>>> Rephrasing for clarity:
>>>
>>> Did we discard the idea of demanding interactive confirmation when a
>>> "dangerous" command is executed in an interactive session? If so,
>>> why? To me, that seems like the most approachable first vector and
>>> likely to get a good return on investment.
>>
>>
>> Wasn't discarded. I'll include in the KEP and we can dig in for more cases of platforms doing funny things with spoofing TTY's.
>
>
> I've done a bit more digging into Ben's comments about TTY detection and think we may need to discard that route.
>
> TravisCI is one provider we know of spoofing a TTY to trick tools into outputting things like color and status bars. It looks like CircleCI may do this as well.

Well. That's unfortunate.

> kind hardcoded some vendor specific environment variables to get around this with Travis but I don't think we can/want to do that for all the vendors.
>
> If we can't reliably detect TTY's inside these pipelines we will indeed break scripts.

Yes, that's the conclusion I come to, also. Harumph.

> Thoughts?
>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/d12b0308-7f54-4e96-990b-23cc12d7b267n%40googlegroups.com.

Jordan Liggitt

unread,
Jun 3, 2021, 10:48:32 PM6/3/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion
On Thu, Jun 3, 2021 at 8:11 PM 'Tim Hockin' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
On Thu, Jun 3, 2021 at 4:32 PM Eddie Zaneski <eddi...@gmail.com> wrote:
>
> On Wednesday, June 2, 2021 at 11:40:57 PM UTC-6 Eddie Zaneski wrote:
>>
>> On Wednesday, June 2, 2021 at 6:08:55 PM UTC-6 Tim Hockin wrote:
>>>
>>> Rephrasing for clarity:
>>>
>>> Did we discard the idea of demanding interactive confirmation when a
>>> "dangerous" command is executed in an interactive session? If so,
>>> why? To me, that seems like the most approachable first vector and
>>> likely to get a good return on investment.
>>
>>
>> Wasn't discarded. I'll include in the KEP and we can dig in for more cases of platforms doing funny things with spoofing TTY's.
>
>
> I've done a bit more digging into Ben's comments about TTY detection and think we may need to discard that route.
>
> TravisCI is one provider we know of spoofing a TTY to trick tools into outputting things like color and status bars. It looks like CircleCI may do this as well.

Well.  That's unfortunate.

> kind hardcoded some vendor specific environment variables to get around this with Travis but I don't think we can/want to do that for all the vendors.
>
> If we can't reliably detect TTY's inside these pipelines we will indeed break scripts.

Yes, that's the conclusion I come to, also.  Harumph.

If we insert a hard wait for confirmation for auto-detected TTYs, I agree that's potentially breaking (I have other questions about using stdin for confirmation when combined with other stdin uses like `-f -` or credential plugins that make use of stdin, but I'll save those questions for the KEP).

If instead of a hard wait for confirmation, we insert a stderr warning + delay for specific super-destructive things on detected TTY to give time to abort, that seems potentially acceptable.


Tim Hockin

unread,
Jun 4, 2021, 1:02:07 AM6/4/21
to Jordan Liggitt, Eddie Zaneski, Kubernetes developer/contributor discussion
Potentially acceptable... Anyone using kubectl in CI will slow down...  But maybe that's ok if we limit the the automatic interact to just "very scary" things and not antly old "apply".

Jordan Liggitt

unread,
Jun 4, 2021, 1:04:43 AM6/4/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion
Yeah, I'm envisioning scoping to commands like:

# delete all the things
kubectl delete namespaces --all

# ineffective namespace scoping
kubectl delete persistentvolumes --all --namespace=foo

Abhishek Tamrakar

unread,
Jun 4, 2021, 1:53:08 AM6/4/21
to Jordan Liggitt, Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion
Strongly agree, if we limit to only potential "purge everything" like commands it would be good to have. 

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAMBP-p%2BOf1KeQMDdvfcA21cWccSpi7HQULgKeq40ezq-FcydTg%40mail.gmail.com.

Brian Topping

unread,
Jun 4, 2021, 2:10:53 AM6/4/21
to Abhishek Tamrakar, Jordan Liggitt, Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion
This is the same theme with `kubeadm reset` when the cluster only has a single node as well...

You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAGBZAhGjNTyxDswciNnqQ%2Bx3%3Duco0u9x8NP49oGoYVzMSz_BBw%40mail.gmail.com.

Paco俊杰Junjie 徐Xu

unread,
Jun 4, 2021, 4:31:26 AM6/4/21
to Kubernetes developer/contributor discussion
+1 for interactive confirmation. The deletion protection for linux is `alias rm='rm -iv'`

  • This start rm (remove files or directories) in interactive and verbose mode to avoid to delete files by mistake.
  • Note that the -f (force) option ignore the interactive option.
alias krm='kubectl delete -i '

Bill WANG

unread,
Jun 5, 2021, 8:50:29 AM6/5/21
to Kubernetes developer/contributor discussion
Frankly I hope to not support `--all` option at all. If we really need this function, I can run a simple `for loop` to go through all namespaces and delete them. `--all` with delete is evil 

What I need is, when run delete with `--all`, it provides some warnings, such as list some resources will be deleted, wait for confirmation, show some different color if can, and so on.

For change 2&3 I am not really care. 

Bill

raghvenders raghvenders

unread,
Jun 6, 2021, 8:56:16 PM6/6/21
to Bill WANG, Kubernetes developer/contributor discussion
I am totally with the approach behind interactive and delete-all.

But I wish to log my initial thoughts where we may consider them for future reference or scope.
My initial thought around delete is through a consensus-based approach. Below is just the example but it is not limited or exactly looks the same.
For example, to delete a namespace, the same delete command to be executed twice ( cluster-admin) followed by a namespace owner. Likewise for delete all. 


The total idea I see here is sharing power and decentralized unlike a delete performed against a single Linux machine or a file system. 
It may demand a state-managed store like etcd or other where we have to main the state for delete consensus/rule. 

Note: This is not just a change for the client, of course, backward compatibility and others component changes come in. And the change in operational aspects for users
and I don't know how to make this behave in the pipeline and other automation approaches.

If you feel this is a meaningful and rightful approach to consider, I can log it against the scope for enhancements in the future.


Regards,
Raghvender






Eddie Zaneski

unread,
Jun 7, 2021, 7:40:44 PM6/7/21
to Kubernetes developer/contributor discussion
KEP is in a good place to get eyes and thoughts on it. This should give us a starting place to get some protections in place and we can iterate from there.

--
Eddie

Ricardo Katz

unread,
Jun 22, 2021, 1:07:19 PM6/22/21
to Eddie Zaneski, Kubernetes developer/contributor discussion
Told Eddie in private I did this, and came here to shame myself :)

So it seems to me that we are assuming that experienced users only will have cluster admin, and that the problem maybe is limited to the scope of actions between destructive things or not.

Probably during the creation of concepts like PV and PVC we didn't take care of how much they look like and how destructive a mistake of one letter could be for users, but well... I did a kubectl delete pv --all -n mynamespace (instead of delete pvc) yesterday and luckily I could interrupt that on time.

This might be some user story into Eddie's KEP (Ricardo, a SRE running against time to deliver something...) but the thing is: from a user perspective, I prefer annoying actions to be bypassed later (like, --interactive being defaulted, and adding --interactive=false) for dangerous actions then:

* Having a sleep on the command and not looking into that because I need to go to the restroom, entered the command and left it on my screen (and also turning my CI more slow, which in some cases can cost me money)
* Having to explain why a wrong letter caused an outage of all the disks from my cluster :)

From a user perspective, maybe 6 releases is too much. Maybe a warning is not effective. Adding annotations wouldn't be effective as well (unless I add some mutation/OPA/etc that adds that annotation on critical resources). I would be really glad if my cli was updated enough to stop me doing wrong things :)

Probably we are asking in the wrong place? As the k/dev list, we are concerned about breaking something, but what is the concern of real world users? (Not that we are not, but well, we might be all a bit biased).

Best


--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Brendan Burns

unread,
Jun 22, 2021, 1:21:20 PM6/22/21
to Ricardo Katz, Eddie Zaneski, Kubernetes developer/contributor discussion
It's possible that the right thing to do here is to make the default opt-in vs opt-out an api-server flag. That way the behavior can be decided on a per-cluster setting, rather than a client-side setting.

It seems like a cluster creator/administrator is the right person to say: "this is prod, I don't want to accidentally delete stuff"

Making it namespace specific would be even better, but also more complicated to implement.

--brendan



From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Ricardo Katz <ricard...@gmail.com>
Sent: Tuesday, June 22, 2021 10:06 AM
To: Eddie Zaneski <eddi...@gmail.com>
Cc: Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>
Subject: Re: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete
 

Tim Hockin

unread,
Jun 28, 2021, 12:28:57 PM6/28/21
to Brendan Burns, Ricardo Katz, Eddie Zaneski, Kubernetes developer/contributor discussion
I'm in favor of a belt AND suspenders. If we are going to consider
admin-directed safety config, perhaps a proper API, rather than flags,
makes more sense.

On Tue, Jun 22, 2021 at 10:21 AM 'Brendan Burns' via Kubernetes
developer/contributor discussion <kuberne...@googlegroups.com>
wrote:
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/MW4PR21MB20028389D63C5BA38A2C6B94DB099%40MW4PR21MB2002.namprd21.prod.outlook.com.

Brendan Burns

unread,
Jun 28, 2021, 12:51:53 PM6/28/21
to Tim Hockin, Ricardo Katz, Eddie Zaneski, Kubernetes developer/contributor discussion
If we do go down the API route, I'll recommend checking out the Azure Management Lock API:


Which implements exactly this for Azure (with some more fine-grained semantics).

When you read the docs, you can substitute s/"resource group"/"namespace"/ in your head to make it parse for Kubernetes

--brendan



From: Tim Hockin <tho...@google.com>
Sent: Monday, June 28, 2021 9:28 AM
To: Brendan Burns <bbu...@microsoft.com>
Cc: Ricardo Katz <ricard...@gmail.com>; Eddie Zaneski <eddi...@gmail.com>; Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Fenhancements%2Fpull%2F2777&amp;data=04%7C01%7Cbburns%40microsoft.com%7C230a4d79888b4d20c5ee08d93a51d098%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637604945333777401%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C5000&amp;sdata=9Vu%2BqmaEUiQA9o1yzz8ZAAP4WtuB8be%2BMP%2F%2FhH6gjes%3D&amp;reserved=0
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Fenhancements%2Fblob%2F0f321138775017d9bd7a44604de59cf399aa67fd%2Fkeps%2Fsig-cli%2F2775-kubectl-delete-interactivity-delay%2FREADME.md&amp;data=04%7C01%7Cbburns%40microsoft.com%7C230a4d79888b4d20c5ee08d93a51d098%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637604945333777401%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C5000&amp;sdata=D94nbYLdQOl1WREsLGieMrlwoJ8GZQZGwJLEdwgrz2Q%3D&amp;reserved=0

>
> --
> Eddie
>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

raghvenders raghvenders

unread,
Jul 6, 2021, 1:20:59 AM7/6/21
to Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
@Brendan Burns - As we know RBAC provides some level of lock on resources. Where do you see this management kind of lock for resources coming from? 

Does it come overarching ?  Say for Example - Delete Pods are forbidden to delete with RBAC for some users/Group and those Pods are locked for deletion too, What will api server respond?

Regards,
Raghvender

On Thu, May 27, 2021 at 3:21 PM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan



From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Clayton Coleman <ccol...@redhat.com>
Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddi...@gmail.com>
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Eddie Zaneski

unread,
Jul 13, 2021, 2:15:05 PM7/13/21
to Kubernetes developer/contributor discussion
I like the idea of adding an API server flag to opt clients into confirmation (or even some of the other breaking changes), say via a header.

What are the thoughts on providers supporting that as an option for cluster creation?

Tim Hockin

unread,
Jul 13, 2021, 5:40:06 PM7/13/21
to Eddie Zaneski, Kubernetes developer/contributor discussion
Is this something that needs to be a flag, or should it be an actual API?

On Tue, Jul 13, 2021 at 11:15 AM Eddie Zaneski <eddi...@gmail.com> wrote:
>
> I like the idea of adding an API server flag to opt clients into confirmation (or even some of the other breaking changes), say via a header.
>
> What are the thoughts on providers supporting that as an option for cluster creation?
>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/918ad33f-5d21-46ae-b636-1c30d6d3d139n%40googlegroups.com.

Brendan Burns

unread,
Jul 13, 2021, 5:55:42 PM7/13/21
to Eddie Zaneski, Tim Hockin, Kubernetes developer/contributor discussion
Responding to Raghvender's question:

If you don't have RBAC permissions, those are the first to respond, so you get a response that says "Unauthorized"

If you do have RBAC permissions, locks supersede RBAC, so even if I have RBAC to delete something, I can't delete if it is locked. I first have to remove the lock, and then I can delete it.
The message to the client is something like: "This resource can not be deleted because it is locked, please check with the resource owner to remove the lock before deleting it"

Locks turn the delete into a two-step process that is safer, since it requires an explicit gesture to unlock. Think of it like the write-protect tab on a USB drive (or Floppy disk if you're as old as I am...)

--brendan



From: 'Tim Hockin' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>
Sent: Tuesday, July 13, 2021 2:39 PM
To: Eddie Zaneski <eddi...@gmail.com>
Cc: Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>
Subject: Re: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete
 
Is this something that needs to be a flag, or should it be an actual API?

On Tue, Jul 13, 2021 at 11:15 AM Eddie Zaneski <eddi...@gmail.com> wrote:
>
> I like the idea of adding an API server flag to opt clients into confirmation (or even some of the other breaking changes), say via a header.
>
> What are the thoughts on providers supporting that as an option for cluster creation?
>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Tim Hockin

unread,
Jul 13, 2021, 6:06:16 PM7/13/21
to Brendan Burns, Eddie Zaneski, Kubernetes developer/contributor discussion
Are we again considering delete-inhibitors? I'm a fan, but they have
corner cases to be pinned down.

It is not clear to me what happens if I have a delete-inhibit on
something inside a namespace and then try to delete the namespace - we
don't have transactions, so we can't abort the whole thing. It would
be stuck in a weird partially-deleted state. We can't pre-check
delete-ability and then delete (TOC-TOU). We'd need a "delete
pending" indicator, so mutations would be blocked, or something like
that, which has yet more corner cases.

Eddie Zaneski

unread,
Jul 13, 2021, 6:18:42 PM7/13/21
to Kubernetes developer/contributor discussion
> Are we again considering delete-inhibitors?

I think delete inhibitors are a great idea and something we should implement but are out of scope of this KEP.

> Is this something that needs to be a flag, or should it be an actual API?

I am thinking flag but curious what you think an API for this could look like. Create a cluster level resource that enforces new behavior?

Tim Hockin

unread,
Jul 13, 2021, 6:31:07 PM7/13/21
to Eddie Zaneski, Kubernetes developer/contributor discussion
On Tue, Jul 13, 2021 at 3:18 PM Eddie Zaneski <eddi...@gmail.com> wrote:
>
> > Are we again considering delete-inhibitors?
>
> I think delete inhibitors are a great idea and something we should implement but are out of scope of this KEP.

Agree

> > Is this something that needs to be a flag, or should it be an actual API?
>
> I am thinking flag but curious what you think an API for this could look like. Create a cluster level resource that enforces new behavior?

Well, flags are generally very inaccessible to cluster operators and
their scope is naturally very wide (or global). E.g. you could never
encode an open-ended list of protected resources in a flag. Managed
providers have to plumb flags through implementations, etc. Changing
your mind about a flag means changing apiserver config, which is kind
of scary.

On the other hand, we've historically shied away from hundreds of tiny
resources, but more recently that seems less of a concern. So, for
example, maybe (not a foregone conclusion) it makes sense to consider
an API policy object which encodes things like this or even has more
details like a selector, a resource type or apigroup, a RoleName, etc.

Tim

Brendan Burns

unread,
Jul 13, 2021, 6:36:23 PM7/13/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion
fwiw, hanging namespace deletes is currently a thing already, here's the recipe from what I have seen:

1) create a custom resource that is api-server delegated resource, not CRD
2) take down the corresponding delegated API server so there is no backend for the resource anymore
3) try to delete the namespace

It hangs forever b/c it can't delete the resource (since the aggregated API server's not there) and it can't delete the namespace b/c there is still a resource in it.

While that is annoying, it doesn't seem to be a blocker to adding locks to the API.

--brendan



From: Tim Hockin <tho...@google.com>
Sent: Tuesday, July 13, 2021 3:05 PM
To: Brendan Burns <bbu...@microsoft.com>
Cc: Eddie Zaneski <eddi...@gmail.com>; Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>

>
> --
> You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

raghvenders raghvenders

unread,
Jul 13, 2021, 10:20:31 PM7/13/21
to Brendan Burns, Eddie Zaneski, Kubernetes developer/contributor discussion, Tim Hockin
That’s a good analogy :) 

Certainly I was an avid high school learner came across through floppies and CDs :)


Regards,
Raghvender

Daniel Smith

unread,
Jul 14, 2021, 12:42:01 PM7/14/21
to raghvenders raghvenders, Brendan Burns, Eddie Zaneski, Kubernetes developer/contributor discussion, Tim Hockin
I don't agree that the possibility of a protected item inside a namespace is a blocker for adding a "protected" bit, for two reasons:

a) As mentioned, we already can get stuck partway through namespace deletion in other cases (https://www.youtube.com/watch?v=0VWNWJktcHk)
b) It's easy (in fact, would probably happen without trying) to make DELETECOLLECTION ignore protected bits

raghvenders raghvenders

unread,
Jul 14, 2021, 2:51:23 PM7/14/21
to Daniel Smith, Brendan Burns, Eddie Zaneski, Kubernetes developer/contributor discussion, Tim Hockin
As of now in file system deletions with recursive statergy, the resources that hold the file system prohibit the parent deletion.

I am curious to understand how different would that be in Kubernetes resources to gain advantage or complexity in being in-line with that ?


Regards,
Raghvender

Eddie Zaneski

unread,
Aug 31, 2021, 7:57:55 PM8/31/21
to Kubernetes developer/contributor discussion
Tim and I had a quick chat to expand on the idea of opting users into breaking changes through a cluster scoped resource and I wound up with this poc https://youtu.be/H5yd1dRl44w.

The idea is to have some sort of configuration that can be set on a cluster by a cluster administrator. By adding this configuration an admin can add additional safeguards that don't reduce an end user's access but change their user experience of interacting with resources. An approach like this can be consumed by clients other than kubectl (see https://github.com/helm/helm/issues/8981) and other actions (e.g. apply).

I think the focus of discussion right now should be, is configuring user experience and client behavior with cluster resources a mechanism we want to use as opposed to the technical implementations of that mechanism.

Thoughts?

kfox11...@gmail.com

unread,
Aug 31, 2021, 10:02:29 PM8/31/21
to Kubernetes developer/contributor discussion
Interesting solution.

But it may add to some confusion. Say I use a cluster that has the flag set and 99% of the time I use that cluster. and then I go to a cluster that doesn't, and I accidentally delete something because I was careless and thought the system would protect me because I didn't realize the cluster admin overrode some setting I didn't know existed?

Breaking changes on a new version of a software is somewhat expected. That's why there are change logs / breaking change notes. It changing behavior per cluster is a bit less expected.

For other use cases this might be ok. This one feels like it potentially just moves the problem somewhere else? Still though, some protection is still better then no protection, so maybe its still a good idea. Just thinking out loud.

Thanks,
Kevin


Tim Hockin

unread,
Aug 31, 2021, 11:34:19 PM8/31/21
to kfox11...@gmail.com, Kubernetes developer/contributor discussion


On Tue, Aug 31, 2021, 7:02 PM kfox11...@gmail.com <kfox11...@gmail.com> wrote:
Interesting solution.

But it may add to some confusion. Say I use a cluster that has the flag set and 99% of the time I use that cluster. and then I go to a cluster that doesn't, and I accidentally delete something because I was careless and thought the system would protect me because I didn't realize the cluster admin overrode some setting I didn't know existed?

Breaking changes on a new version of a software is somewhat expected. That's why there are change logs / breaking change notes. It changing behavior per cluster is a bit less expected.

I want to disagree.  While both are unfortunate, different clusters have different flags or plugins all the time.  

If you change something like this with the version, what happens is that applications which worked on Monday start failing on Tuesday because your managed provider upgraded the cluster.

At least being part of the API it is in the hands of someone in the cluster-admin path



For other use cases this might be ok. This one feels like it potentially just moves the problem somewhere else? Still though, some protection is still better then no protection, so maybe its still a good idea. Just thinking out loud.

Thanks,
Kevin


On Tuesday, August 31, 2021 at 4:57:55 PM UTC-7 eddi...@gmail.com wrote:
Tim and I had a quick chat to expand on the idea of opting users into breaking changes through a cluster scoped resource and I wound up with this poc https://youtu.be/H5yd1dRl44w.

The idea is to have some sort of configuration that can be set on a cluster by a cluster administrator. By adding this configuration an admin can add additional safeguards that don't reduce an end user's access but change their user experience of interacting with resources. An approach like this can be consumed by clients other than kubectl (see https://github.com/helm/helm/issues/8981) and other actions (e.g. apply).

I think the focus of discussion right now should be, is configuring user experience and client behavior with cluster resources a mechanism we want to use as opposed to the technical implementations of that mechanism.

Thoughts?

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Maciej Szulik

unread,
Sep 9, 2021, 7:56:55 AM9/9/21
to Kubernetes developer/contributor discussion
On Wed, Sep 1, 2021 at 1:58 AM Eddie Zaneski <eddi...@gmail.com> wrote:
Tim and I had a quick chat to expand on the idea of opting users into breaking changes through a cluster scoped resource and I wound up with this poc https://youtu.be/H5yd1dRl44w.

The proposed approach, although pretty cool, has an important downside.
This mechanism has to be implemented for every client interacting with k8s.
admission + labelling/annotation. The benefit of that approach is that:
1. it's centrally managed by the cluster admin who can easily enable, disable or configure it.
2. it works with every client, since you're still working with well known delete + label/annotate
operations.

Lastly, from my own experience, I can say that different administrators have
different needs wrt deletion protection and having a central mechanism is not
always desirable. The current extension points we have are pretty good
protecting our clusters :-) But I agree that it's not always easy to put the pieces
together.
 

The idea is to have some sort of configuration that can be set on a cluster by a cluster administrator. By adding this configuration an admin can add additional safeguards that don't reduce an end user's access but change their user experience of interacting with resources. An approach like this can be consumed by clients other than kubectl (see https://github.com/helm/helm/issues/8981) and other actions (e.g. apply).

I think the focus of discussion right now should be, is configuring user experience and client behavior with cluster resources a mechanism we want to use as opposed to the technical implementations of that mechanism.

Thoughts?

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Tim Hockin

unread,
Sep 9, 2021, 8:23:49 PM9/9/21
to Maciej Szulik, Kubernetes developer/contributor discussion
On Thu, Sep 9, 2021 at 4:56 AM Maciej Szulik <masz...@redhat.com> wrote:


On Wed, Sep 1, 2021 at 1:58 AM Eddie Zaneski <eddi...@gmail.com> wrote:
Tim and I had a quick chat to expand on the idea of opting users into breaking changes through a cluster scoped resource and I wound up with this poc https://youtu.be/H5yd1dRl44w.

The proposed approach, although pretty cool, has an important downside.
This mechanism has to be implemented for every client interacting with k8s.

The original problem statement was a client problem - kubectl delete --all makes it too easy to blow your own foot off.  It's not about protecting any particular resources (which is ALSO useful).
 
admission + labelling/annotation. The benefit of that approach is that:
1. it's centrally managed by the cluster admin who can easily enable, disable or configure it.
2. it works with every client, since you're still working with well known delete + label/annotate
operations.

Belts *and* suspenders?  I endorse the resource delete-inhibit KEP, but I still think this is an interesting extra layer.
 
Lastly, from my own experience, I can say that different administrators have
different needs wrt deletion protection and having a central mechanism is not
always desirable. The current extension points we have are pretty good
protecting our clusters :-) But I agree that it's not always easy to put the pieces
together.
 

The idea is to have some sort of configuration that can be set on a cluster by a cluster administrator. By adding this configuration an admin can add additional safeguards that don't reduce an end user's access but change their user experience of interacting with resources. An approach like this can be consumed by clients other than kubectl (see https://github.com/helm/helm/issues/8981) and other actions (e.g. apply).

I think the focus of discussion right now should be, is configuring user experience and client behavior with cluster resources a mechanism we want to use as opposed to the technical implementations of that mechanism.

Thoughts?

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/fd2c44b8-ec5b-4ca7-a707-8ed6299f0d0an%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Daniel Smith

unread,
Sep 23, 2021, 7:37:04 PM9/23/21
to Tim Hockin, Maciej Szulik, Kubernetes developer/contributor discussion
Hi everyone,

I want to bump this thread to bring this KEP to your attention. It proposes a mechanism ("liens") for protecting individual objects from deletion.

I'm not sure it will completely solve this use case, but it may be part of the solution. There are some open questions on it still, so please take a look!

Josh Berkus

unread,
Sep 28, 2021, 2:35:42 PM9/28/21
to Maciej Szulik, Kubernetes developer/contributor discussion
On 9/9/21 4:56 AM, Maciej Szulik wrote:
> The proposed approach, although pretty cool, has an important downside.
> This mechanism has to be implemented for every client interacting with k8s.
> You can implement (and Brendan pointed it out in
> https://groups.google.com/g/kubernetes-dev/c/y4Q20V3dyOk/m/yb5xfHDiBwAJ?utm_medium=email&utm_source=footer
> <https://groups.google.com/g/kubernetes-dev/c/y4Q20V3dyOk/m/yb5xfHDiBwAJ?utm_medium=email&utm_source=footer>)
> through simple
> admission + labelling/annotation. The benefit of that approach is that:
> 1. it's centrally managed by the cluster admin who can easily enable,
> disable or configure it.
> 2. it works with every client, since you're still working with well
> known delete + label/annotate
> operations.

I don't want to lose track of the 80% use-case here. That use-case is:

- Interactive users should have to confirm cascading deletes.
- CI/CD systems should NOT have to.

So far, all of the proposed solutions -- whether Eddie's or Brendan's --
require modifying every singe CI/CD system that works with Kubernetes in
order to re-enable regular operation. IMHO, that's a showstopper; it's
simply not worth forcing an ecosystem-wide change in order to disable a
footgun.

This is why I suggested upthread to make this a simple .kube/config option.

--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO

Brendan Burns

unread,
Sep 28, 2021, 5:06:07 PM9/28/21
to Josh Berkus, Maciej Szulik, Kubernetes developer/contributor discussion
I'm not convinced that everyone wants to give CI/CD a free pass. I've seen lots of examples of CI/CD systems causing unintended consequences because the author of the declarative state didn't understand the consequences of what they were doing.

I want to highlight that one of the most important parts of my proposal is that it is opt-in.

The addition of the admission controller lock is that it is a cluster adminstrator's decision.

Want to enable locks? Great, add this admission controller.
Worried about the impact on CI/CD of locks? Don't add the admission controller.

I don't think that safeties like this should be left to end-users. Requiring every user to modify their .kube/config isn't going to happen.

--brendan



From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Josh Berkus <jbe...@redhat.com>
Sent: Tuesday, September 28, 2021 11:35 AM
To: Maciej Szulik <masz...@redhat.com>; Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>

Subject: Re: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete
On 9/9/21 4:56 AM, Maciej Szulik wrote:
> The proposed approach, although pretty cool, has an important downside.
> This mechanism has to be implemented for every client interacting with k8s.
> You can implement (and Brendan pointed it out in
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fkubernetes-dev%2Fc%2Fy4Q20V3dyOk%2Fm%2Fyb5xfHDiBwAJ%3Futm_medium%3Demail%26utm_source%3Dfooter&amp;data=04%7C01%7Cbburns%40microsoft.com%7C728839bea9c8482f835f08d982aec4e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637684509970189623%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C7000&amp;sdata=XV9LyoY5ivq0y8JrNLmGgu%2FL02TsFE39OibSd%2FSRxN0%3D&amp;reserved=0
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fkubernetes-dev%2Fc%2Fy4Q20V3dyOk%2Fm%2Fyb5xfHDiBwAJ%3Futm_medium%3Demail%26utm_source%3Dfooter&amp;data=04%7C01%7Cbburns%40microsoft.com%7C728839bea9c8482f835f08d982aec4e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637684509970189623%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C7000&amp;sdata=XV9LyoY5ivq0y8JrNLmGgu%2FL02TsFE39OibSd%2FSRxN0%3D&amp;reserved=0>)
> through simple
> admission + labelling/annotation. The benefit of that approach is that:
> 1. it's centrally managed by the cluster admin who can easily enable,
> disable or configure it.
> 2. it works with every client, since you're still working with well
> known delete + label/annotate
> operations.

I don't want to lose track of the 80% use-case here.  That use-case is:

- Interactive users should have to confirm cascading deletes.
- CI/CD systems should NOT have to.

So far, all of the proposed solutions -- whether Eddie's or Brendan's --
require modifying every singe CI/CD system that works with Kubernetes in
order to re-enable regular operation.   IMHO, that's a showstopper; it's
simply not worth forcing an ecosystem-wide change in order to disable a
footgun.

This is why I suggested upthread to make this a simple .kube/config option.

--
-- Josh Berkus
    Kubernetes Community Architect
    OSPO, OCTO

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

Daniel Smith

unread,
Sep 28, 2021, 7:30:30 PM9/28/21
to Brendan Burns, Josh Berkus, Maciej Szulik, Kubernetes developer/contributor discussion
Let me advertise https://github.com/kubernetes/enhancements/pull/2840 again - if e.g. letting a CI/CD system ignore the proposed liens is a use case you have, please leave a comment saying so. Right now that's not in the design but I can imagine an easy change to enable that.

Josh Berkus

unread,
Sep 28, 2021, 7:56:27 PM9/28/21
to Brendan Burns, Maciej Szulik, Kubernetes developer/contributor discussion
On 9/28/21 2:05 PM, Brendan Burns wrote:
> Want to enable locks? Great, add this admission controller.
> Worried about the impact on CI/CD of locks? Don't add the admission
> controller.

That puts admins is a state where they can't implement safeties for
interactive developers unless they can be sure that 100% of their
deployment toolchain also has code supporting it. Which means I think
you'd see very few users actually enabling the admission controller for
several years after introducing the feature.

I thought SIG-CLI wanted something folks would be able to use sooner
than that?

Brendan Burns

unread,
Sep 28, 2021, 8:40:28 PM9/28/21
to Josh Berkus, Maciej Szulik, Kubernetes developer/contributor discussion
Maybe. But I find that a number of significant outages due to fat-finger errors has a way of making people very open to making changes 🙂

I can't speak for SIG-CLI, but for kubectl, you certainly could implement locks as something that only kubectl supports sooner, but I don't think that's a great solution since people will assume that it they are protecting all clients, not just kubectl.

--brendan



From: Josh Berkus <jbe...@redhat.com>
Sent: Tuesday, September 28, 2021 4:56 PM
To: Brendan Burns <bbu...@microsoft.com>; Maciej Szulik <masz...@redhat.com>; Kubernetes developer/contributor discussion <kuberne...@googlegroups.com>

Subject: Re: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete

Nathan Fisher

unread,
Sep 29, 2021, 9:44:06 AM9/29/21
to Brendan Burns, Josh Berkus, Maciej Szulik, Kubernetes developer/contributor discussion
Hi folks!

Would it be safe to say there's a sliding scale of concern based on the workload type:

1. statefulset - irreplaceable data if you aren't backing it up.
2a. deployment - user facing production workloads.
2b. daemonset - service meshes, observability, and similar fun.
4. cronjob.
5. standalone pod/job - debug tools most commonly.

The examples mentioned in this thread seemed to reference 3 fat-finger scenarios:
1. deleting resource types with the --all flag.
2. deleting a namespace.
3. deleting an individual resource.

Is there a significant number of people reporting that they've deleted their prod workload with --all by accident? Feels like one of those mistakes that happens once... and you don't repeat it.

With namespaces I've seen companies prevent foot guns by using RBAC to limit user's access.

The real concern seems to be deletion of an individual resource with state.

I generally agree with Brendan Burns in terms of impact and/or fracturing of behaviour. As it would break a lot of other things in the ecosystem like operators, Helm, etc until they're upgraded to work with such a controller.

Would it make sense to instead document specific scenarios, how the existing capabilities could be employed to minimise impact, and then identify the gaps for those scenarios as documented?

Cheers,
Nathan

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.


--
Nathan Fisher
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages