New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding distributed search proposal #24
Adding distributed search proposal #24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, Matt. This looks like a good start to figuring out how we go about this.
- Should the Helm CLI have a way to query this API to search (instead of the local cache)?
- Is there a way we can make installing a chart from a non-checked out repo easier? Similar to checking out Homebrew taps and installing a formula in one command? (e.g. the
ks
CLI for ksonnet can be installed withbrew install ksonnet/tap/ks
, if the ksonnet tap is not checked out, Homebrew will add it). - This may be another proposal, but do we want to propose a way we can maintain the level of trust with charts or repos. Ideas that have been thrown around are ratings/stars, some concept of "trusted" repos if all charts continuously pass linting/test requirements used in kubernetes/charts.
proposals/distributed-search.md
Outdated
|
||
## A Single Search Location | ||
|
||
The goal of this is to have a search site and API that enables the search of many public repositories. Private repositories are a separate scope and can operate with existing tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to specifically state hosting repositories as a non-goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Updated.
proposals/distributed-search.md
Outdated
|
||
To help, and possibly enforce, quality of charts we need to provide tools that can perform an analysis of charts to help validate quality. These tools exist for the stable and incubator charts today. They will be packaged in a manner others can consume and leverage within their workflows. | ||
|
||
Note, work on this step has already begun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to @unguiculus' work on https://github.com/kubernetes-helm/chart-testing? Do we want to link to that specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I added a link
proposals/distributed-search.md
Outdated
The following are outstanding actions that need to be worked out but can happen after the proposal is accepted: | ||
|
||
* [ ] Decide on the hosting location for this search. | ||
* [ ] Decide on and documented the requirements for listed repositories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/documented/document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed
I have two thoughts on this...
Basically, I would do Helm integration as a second proposal when the time is right.
Right now, in Helm 2, you can do something like: $ helm install https://kubernetes-charts.storage.googleapis.com/acs-engine-autoscaler-2.2.0.tgz What we're really talking about doing is aliases (similar to the way
I like the overall idea but I would punt on it as a secondary thing that can come in a follow-up proposal where the nuances of the hard parts can be debated. I do have ideas for another proposal, coming soon, that would enable things like: $ helm install https://kubernetes-charts.storage.googleapis.com/acs-engine-autoscaler or $ helm install https://kubernetes-charts.storage.googleapis.com/acs-engine-autoscaler#^2.2.0 The idea is to make the experience easier by not needing to know all the version information.
Trust is a squishy thing. Who decides trust? It's the end users and different people will decide differently. If we say something is trusted but they fail trust what happens to the Helm project? It'll loose some trust. I like the approach of giving people information and letting them decide how to trust. We set some minimum bar for inclusion and after that the trust is variable. Some things we could do:
Oracle is a company that really brings up the trust question. Some people don't trust them because of all the closed source they do. Others trust them because they have contracts. Trust is not a one size fits all situation. So, I would propose we empower people doing searches rather than try to decide trust for them. |
Thanks, makes sense. I also agree that this should not be the default behaviour but a flag or another command.
Interesting, I didn't know this. Sounds like we could easily make this nicer with some sort of vanity URL, e.g.
You're absolutely right here, and I agree that the trust should be decided based on all the information we can provide to a user. I do think it benefits users to display repos and charts that are higher rated, have a higher frequency of contributions, and follow all CI best practices more prominent though. We need to choose the metrics we use for "trust" carefully, but I don't think that prevents us from being able to sort by them. |
@prydonius Interesting idea with the vanity locations like |
@mattfarina This proposal summarizes all the important things I can recall from past conversations around helm registry (which I realize now has hit a dead end as a solution, but the key requirements are similar): distributed source model with centralized info, ability to register, discoverability, ease of DX for contributors and users, important analytics for users in deciding which package to try (the trust factor). @prydonius I like the account (user/org) namespacing and helm.sh vanity URL ideas too. Probably because I'm already familiar with this from other communities (packagist etc), but also because I think this helps with trust and distinguishing one author or group's version of a package from another. 👏 👏 |
The other thing this makes me think of is the yum/apt repo ecosystem. Yum repos can publish (and EPEL does, among I think others) a package that users can install that adds the signing keys for that repo as trusted keys, sets up the repo in the yum config, etc. I could see helm repos publishing some sort of signed metadata bundle that allows that too. Also like the Linux package ecosystem, we're going to have to deal with package conflicts. Currently this is handled with Linux package repos by repo priorities being set, explicitly or implicitly, and if multiple repos have a given package name/arch/version to be installed, highest-priority repo wins. This also allows users to implicitly indicate, in a way, their trust level of a repo -- "if We don't want to be in charge of setting trust levels, but like we do with stable charts, I do think we want to set basic criteria for a repo's inclusion in whatever index is created. I like the compliance badge idea. We should make sure it's visually clear that we're not saying this repo or its charts are trustworthy, only that it passed some basic functional tests. Tangentially, this also brings up a separate idea -- mirroring. It's out of scope for this discussion but we may want to consider creating/promoting some tools for creating mirrors of repos, both for added resiliency to the public and for enabling repos to be easily used behind strict corporate firewalls. |
@omkensey I see where you are going with this. A few thoughts...
It's worth noting, for those that don't know, you can do something like this today with having added a repo: $ helm install https://kubernetes-charts.storage.googleapis.com/drupal-1.0.0.tgz Mirroring |
@mattfarina this all sounds great. In terms of mirroring @omkensey, this is something not yet implemented, but has been asked for in chartmuseum. This can be part of provided "Repository Tools". Few items that are unclear to me (and can probably be discussed in a separate PR after merge): 1.) "The goal of this is to have a search site and API" - this sounds to be some overlap with functionality provided by Monocular @prydonius Should we break off Monocular API into its own project to support this functionality? Is this something that should be added to chartmuseum, with the ability to provide search on locally hosted repos out-of-the-box? I'm not sure if this proposal is suggesting a new tool or not 2.) Auth. How is authentication handled? How/will the central service delegate the responsibility of authorization to individual repos? This is one of the bigger challenges of this in my opinion. |
@jdolitsky To respond to your points...
|
I also think that Monocular is the best place to go and implement this, having been originally designed for this purpose. Monocular already supports aggregating multiple repositories today, though repositories can only currently be configured in the global configuration. We would want to extend this to make it possible for logged-in users to add their own repositories under a namespace. Perhaps a good next step here, after there is consensus on the overall goal, is to put together a proposal on the features we will need to implement in Monocular to get the desired functionality. |
@prydonius for a v0.1 is there anything beyond straight monocular we should have? |
@mattfarina Are we talking about hosting chart packages? Or only pointers to user-hosted repos? |
That would be a good start, along with a Monocular config file to list the repositories the site will index. However, I think it would be good to start discussing what work we'd want to do beyond that (e.g. pagination, improved search). |
@jdolitsky Just to restate here, we are not looking at the hosting of packages. Rather, we want to make packages hosted in a distributed manner, in many different repositories, to be discoverable. @prydonius Here are a few ideas that come to mind for me:
Given a little time I'll try to come up with some more ideas. |
One thing this document doesn't go into too much detail: who has permission to add or remove repositories to the list? I'm assuming that's a responsibility of the chart maintainers? Putting it another way: If I were to maintain my own repository of charts and there was decent uptake from the community, how would I go through the process to get it added to this service? |
I like the simplicity of the godoc.org model where user traffic drives discovery of content. However, I think it'd be hard to apply here. Do we want this to be completely equalizing or do we want to aggregate a list of trusted sources and have a push protocol for them to notify the indexer of new content being available? |
@bacongobbler The discussions so far has been the charts maintainers. There would be documented criteria that is still TBD. @jzelinskie Here's my take on your questions...
I think this will change over time. Monocular already has a method. I expect we will iterate on and improve this over time. We would start with Monocular as it roughly sits today and then iterate to improve on it. This isn't a case of build something new.
The godoc model has lots of faults too. For example, how do you differentiate between the root and all the forks when they are picked up by godoc? For pkg search it has problems. Especially when something has numerous contributors that end up being indexed. When you distribute you don't have access to the download data to try to differentiate either. I'm personally partial to the packagist model... https://packagist.org/. Someone needs to choose to list something when they want it to be discoverable. The intended central sources of truth end up being listed rather than all the forks. Yet, the search is really just a public search and metadata cache. The sources of truth reside in 3rd parties. |
Signed-off-by: Matt Farina <matt@mattfarina.com>
This was voted on by the Helm maintainers and passed. We will be moving forward with distributed search of charts hosted by the Helm project. |
cc: @prydonius @unguiculus @technosophos @michelleN
The best correlation I have to this is packagist for PHP.
Please nit as this is a draft.