-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add In-Tree Plugin to CSI Driver Migration Design Proposal #2199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aware/dependent on what kind of PV ends up getting bound to their PVC (this is a | ||
core tenet of the k8s storage system). | ||
|
||
On Delete, the internal volume plugin will translate in-tree volume source into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is written that "In-tree provisioner will be changed ... not register the in-tree volume plugin into PV controller". And here PV controller needs the plugin to translate spec to JSON.
IMO, it could register an "alternative plugin" that implements just the deleter interface and not provisioner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you are suggesting here, that we should still have a "partial" in-tree provisioner that is just able to clean up the volumes that it created before the migration?
annotation with CSI source. | ||
|
||
|
||
### Non-Dynamic Provisioned Volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chapter applies to all volumes that were dynamically provisioned before the migration too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Especially CSI VolumeHandle must be carfully chosen! Volumes attached by CSI | ||
driver will have VolumeAttachment. In-tree volume plugin should be able to | ||
detach the volume in the usual way after downgrade, however, A/D controller | ||
should delete the VolumeAttachment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attach/detach is more complicated.
Migration in-tree -> csi:
Rephrasing the text above, A/D controller must "re-attach" all existing volumes to all nodes, hoping for idempotency, so VolumeAttachment is filled with the right AttachmentMetadata so kubelet can mount device(s).
CSI -> in-tree:
Here I would suggest the same, A/D controller should re-attach all volumes to all nodes, so node.status.volumesAttached.devicePath
is populated and kubelet can mount the devices. There is a race between A/D controller filling node.status.volumesAttached.devicePath
and kubelet seeing the volume already attached (node.status.volumesAttached
is present), but node.status.volumesAttached.devicePath
empty (-> some error).
We must make sure that internal volume plugins are idempotent too. Especially on AWS we must make sure that the internal volume plugin re-uses the device names assigned by CSI driver - on AWS it's a client (A/D controller) who assigns device names to volumes on Attach. So the volume plugin has a table of free/used device names (incl. those who are "in flight" and not confirmed by AWS yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration in-tree -> csi:
I don't really understand this exact case, Kubelet does not need AttachmentMetadata to mount device, I think it will need node.status.volumesattached.devicePath
like you mentioned in the second part though, so the reasoning is the same there.
Another problem if we Attach using in-tree, then try to Detach with CSI, we will not have the VolumeAttachment object to use to "signal" the Detach operation. In this case, I think your solution is correct that we must somehow "re-attach" using CSI to create this VolumeAttachment object, and it should be idempotent by definition of the spec (we must ensure we use same device name).
Am I misunderstanding?
CSI -> in-tree:
This one is tricky, I realized I never thought of this case because GCE PD doesn't get device paths from node.status
for mount operations (it does for UnmountDevice though) but I just looked through AWS and that driver does.
The problem is that "node.status.volumesAttached.devicePath empty (-> some error)." wont be true, the devicePath will be filled with the csiAttachID. The race is a little more confusing at this point because the in-tree plugin will try to do operations using that, we will have to confirm that it will try to retry later.
So the AI to add to the design:
- A/D Controller should check if each attached device was attached with in-tree or CSI
- If it was attached by the "other" mode, we kick off attach again
- CSI plugin must have idempotent attach (already true), we must make sure in-tree attach is idempotent (TBD)
- Kubelet must retry MountDevice even if
node.status.volumesAttached.devicePath
is filled but incorrect
This is specifically for the case that we switch on/off migration in between Attach and MountDevice/Mount.
We will not worry about what happens when switching between various mount options, or on the detach/Unmount side for now because of the requirement for draining nodes right now. I imagine that's a whole nother can of worms.
Let me know if this makes sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a section about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in-tree -> csi:
I think we're on the same page now.
CSI -> in-tree:
So the AI to add to the design:
- A/D Controller should check if each attached device was attached with in-tree or CSI
- If it was attached by the "other" mode, we kick off attach again
CSI plugin must have idempotent attach (already true), we must make sure in-tree attach is idempotent (TBD)- Kubelet must retry MountDevice even if node.status.volumesAttached.devicePath is filled but incorrect
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will not worry about what happens when switching between various mount options, or on the detach/Unmount side for now because of the requirement for draining nodes right now. I imagine that's a whole nother can of worms.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually solved by the combination of adding the NodeAnnotation when the Kubelet is migrated + the requirement to drain nodes before Kubelet Upgrade. We cannot hit these cases.
And at this point users doing their own deployment and not installing the GCE PD | ||
CSI driver encounter an error. | ||
|
||
## Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to test configuration skew? E.g kubelets migrated to CSI drivers and controllers using in-tree plugins? Or half of kubelets migrated and the second half not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we will be able to support configuration skew.
We will have this devicePath incompatibility, same issue as above (https://github.com/kubernetes/community/pull/2199/files#r192096628) except now theres no actual solution because versions are actually different.
I will add a section about disabling this feature and ignoring the feature gates if Master/Node are not both on a version that supports Migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a section about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure how it safely migrate. Admin disables the feature and updates cluster. Now all components have the right version, however, they still use in-tree plugins. What now?
- Admin enables the feature on masters. They will start re-attaching volumes and creating VolumeAttachment objects.
- For each node:
2.1 Admin drains the node.
2.2 Admin enables the feature gate.
2.3 Admin starts the node.
At point 2.1, the evicted pods are "moved" to another nodes (i.e. they're killed and ReplicaSet/StatefulSet re-creates new ones somewhere else). The new node may or may not have the feature enabled. But master is already enabled, so it will go the CSI way and create VolumeAttachment and only the migrated node can understand it.
Is this reliable?
Just brainstorming here: can there be a flag (annotation) on a node that would tell if the feature (and which plugins) are migrated to CSI on the node? So A/D controller can either go the CSI way or internal plugin per node? It would be really complicated on the A/D controller side. We could make it easier if we allow to flip the feature only on drained nodes so A/D controller does not need to re-attach all volumes and create VolumeAttachments, because there is nothing to re-attach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better here to do not enable it on A/D Controller unless the lowest node verson is > X && migration enabled
. This would solve the issue you bring up here. It may be possible to do in on a node by node basis but I think that would be far too complex of a problem to solve for this problem that only exists in a very transitory state (1-2 release cycles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I also see an issue with this that the newly upgraded node will be version X && migration enabled
and has no way to know (?) what versions the other nodes are on. So that node cannot make a "decision" about whether to use migration or not.
In this case the A/D controller may have migration "off" because it sees there are still nodes at version X-1. But the node at version X will turn migration "on" because of the above reasoning (unless it can tell other node versions).
I dont really follow your sentence: "We could make it easier if we allow to flip the feature only on drained nodes so A/D controller does not need to re-attach all volumes and create VolumeAttachments, because there is nothing to re-attach."
Does this solve the problem I just outlined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase my suggestion. When kubelet starts, it checks what feature gates are enabled and annotates itself with say csi.attach.kubernetes.io/gce-pd=true
to tell A/D controller that it supports GCE PD through CSI.
A/D controller has the feature enabled too and before it starts looking for a volume plugin, it checks node annotation. If the node says it supports CSI for GCE-PD, it will create VolumeAttachment. If the node does not support CSI, A/D controller uses the old volume plugin to attach the volume (and set node.status.volumesAttached).
So the A/D controller must instantiate both old volume plugins and their CSI counterparts and have possibility to use both at the same time (for different nodes). This should not be that complicated, IMO.
Now, what if admin flips the feature gate in one kubelet (e.g. because it's getting updated or the admin enables the feature on each node)? There are volumes attached to the node. Here I suggest to drain the node first, i.e. unmount and detach all volumes and flip the feature gate (and restart kubelet) only when the node is completely empty. Kubelet could spit some warnings / errors when it detects a volume attached by the old volume plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I will add some steps for this in the design. I am curious would this cause a race between adding the annotation and a volume being attached. For example if A/D Controller begins attach on the node after it is restarted but before the node is annotated, it will attach with in-tree plugin and then the kubelet will be trying to mount with CSI (and probably failing).
I guess this will be reconciled by the other part we discussed doing the re-attach during the reconcile loops but it seems like a lot to reason about. I can't think of a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See new commit for added information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually reworked the entire section into ## Upgrade/Downgrade, Migrate/Un-migrate
. PTAL. I believe it succinctly covers all the bases we have discussed so far. I have included an upgrade and downgrade workflow in Upgrade/Downgrade Migrate/Unmigrate Scenarios
to illustrate the solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a very first pass at proposal
permissions. This is necessary because the CSI external controller components | ||
all GET the PV object and use it as a source of truth. We also foresee that | ||
future controller calls such as resize and snapshot will also use and update the | ||
PV as a source of truth. All these external components will also need to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When resizing for example - when user resizes a PVC and volume was originally provisioned by in-tree provisioner then I would assume external controller will update both PV spec and PV CSI spec within the annotation? It is somewhat problematic to update objects as a dependency chain because we don't have transactions..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller would ensure the annotation is updated/exists before calling CSI resize. The resize function in OperationGenerator would need to be modified to do this as well as call out to CSI instead of in-tree if the "migration" is on.
have to attempt to DELETE the VolumeAttachment object that would have been | ||
created if that plugin was migrated. This is because Attach after migration | ||
creates a VolumeAttachment object, and if for some reason we are doing a detach | ||
with the in-tree plugin, the VolumeAttachment object becomes orphaned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would happen if user disables the CSI migration feature gate right? Another thing to keep in mind is - when feature gate is disabled, both controller-manager and kubelet should be able to "recover" state without relying on CSI only objects. For example, node.Status.VolumeAttached
property set for CSI plugin should be compatible with in-tree plugin , in case user downgrades to in-tree plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. With this proposal the node.Status.VolumeAttached
property should be the same when migrated vs. not migrated so the state should be recoverable here.
annotation in. In this case we will add the CSI Source JSON to the | ||
VolumeToAttach object and in Attach we will put the Source in a new field in the | ||
VolumeAttachment object VolumeAttachment.Spec.Source.InlineVolumeSource. The CSI | ||
Attacher will have to be modified to also check this location for a source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be clarified when this VolumeAttachment
will be created. I assume this will happen whenever an attach request is made and feature flag is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, It will just call the CSI Attach which creates the VolumeAttachment
, there is no special code being executed here, we are just redirecting in-tree Attach to CSI Attach instead. VolumeAttachment
is just an implementation detail of CSI that does not exist elsewhere.
For downgrade, starting from a fully migrated cluster you must drain your node | ||
first, then turn off migration for your Kubelet, then turn off migration for the | ||
A/D Controller. The workflow is as follows: | ||
1. A/D Controller and Kubelet are both migrated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding "unschedulable" flag to kubelet to make sure that the node is annotated before it can accept first pods:
- A/D Controller and Kubelet are both migrated
- Kubelet drained and made unschedulable, all volumes unmounted/detached with CSI code
- Kubelet restarted and un-migrated (flags flipped)
- Kubelet removes node annotation to tell A/D Controller this node is not migrated. In case kubelet does not have annotation removal code, admin must remove the annotation manually.
- Kubelet is made schedulable.
- At this point all volumes going onto the node would be using in-tree code for both A/D Controller(b/c of annotation) and Kublet
- Restart and un-migrate A/D Controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
For upgrade, starting from a non-migrated cluster you must turn on migration for | ||
A/D Controller first, then drain your node before turning on migration for the | ||
Kubelet. The workflow is as follows: | ||
1. A/D Controller and Kubelet are both not migrated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding "unschedulable" flag to kubelet to make sure that the node is annotated before it can accept first pods.
- A/D Controller and Kubelet are both not migrated
- A/D Controller restarted and migrated (flags flipped)
- A/D Controller continues to use in-tree code for this node b/c node
annotation doesn't exist - Node drained and made unschedulable. All volumes unmounted/detached with in-tree code
- Kubelet restarted and migrated (flags flipped)
- Kubelet annotates node to tell A/D controller this node has been migrated
- Kubelet is made schedulable.
- Both A/D Controller & Kubelet Migrated, node is in "fresh" state so all new volumes lifecycle is CSI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion, done
1. A/D Controller and Kubelet are both migrated | ||
2. Kubelet drained, all volumes unmounted/detached with CSI code | ||
3. Kubelet restarted and un-migrated (flags flipped) | ||
4. Kubelet removes node annotation to tell A/D Controller this node is not migrated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is very little documentation about cluster downgrade. Does it follow the same path as upgrade? Then A/D controller is downgraded first and it does not know what to do with VolumeAttachment objects.
Can we suggest / document that nodes must be downgraded (drained) first? Then new A/D controller will remove all VolumeAttachment object when draining nodes and newly started old A/D controller will see just volumes attached via internal volume plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the Requirement
to downgrade nodes first is stated in the Upgrade/Downgrade scenarios section. This will need to be called out very loudly when this feature rolls-out.
I believe cluster admins in whatever form have the flexibility to downgrade however they want, there are no current constraints that I am aware of. So constraining them to downgrade nodes first does not conflict with current procedures, hoping @k8s-mirror-cluster-lifecycle-pr-reviews can shed some more light on this
We need review from someone who's familiar with upgrade / downgrade process and how to enable / disable features like this on cluster scope. cc @kubernetes/sig-cluster-lifecycle-pr-reviews |
@davidz627 @jsafrane do you guys think, it is worth splitting inline CSI volumes from in-tree migration stuff? I think, we should not tie them together and it may help us with adoption of CSI for wider use case. |
@gnufied I have answered your question here: kubernetes/kubernetes#64984 (comment) |
|
||
The provisioner will pick up the PVC, and create a PV with a CSI Source that it | ||
would do normally. Using CSISource in the PV might not be fully “backwards | ||
compatible” but it will be ok from a user facing behavior change perspective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be an API break. Tools that rely on the PV source would no longer be able to find that data. At best, you would have to mirror the data in the api into the CSI source and keep them in sync.
compatible” but it will be ok from a user facing behavior change perspective | ||
because this is a generated PV anyway, we’re not changing user supplied objects. | ||
The PV is in the domain of the storage administrator and the user should not be | ||
aware/dependent on what kind of PV ends up getting bound to their PVC (this is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately PV is a public interface that clients can consume to perform administrative or display functionality. Just because they are automated does not mean they can be changed.
|
||
If the volume was provisioned by the in-tree provisioner but CSI Migration is | ||
now turned on - On Delete, the internal volume plugin will translate in-tree | ||
volume source into CSI Source and put it as an annotation to the PV if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why an annotation? Is CSI source not part of the PV already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
translate in-tree volume source into CSI Source and put it as an annotation to the PV
I don't think representing this information twice (and the second time as a mutable annotation) is a good idea.
since we need a fixed transformation of VolumeSource/PersistentVolumeSource -> CSI driver name + parameters, that would seem to belong in a library that can be used by all the components that need to map existing in-tree volume sources to CSI:
- csi external-attacher (to know what CSI driver to call for an AWS/GCE PersistentVolumeSource)
- kubelet (to know what CSI driver to call for a PersistentVolumeSource)
- others?
`provisionClaimOperation` to set the claim provisioner to the migrated external | ||
provisioner name, so the external provisioner will pick this up and treat it as | ||
a CSI Volume from then on. The controller will automatically annotate PVCs for | ||
external provisioning by an provisioner that handles e.g. “google.io/gce-pd”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't provisioning be determined by something on the storage class, not on the PVC? I wouldn't expect any visible change on the PVC
#### External Component Changes | ||
External provisioner will use a different provisioner name as the internal | ||
volume plugin name (e.g. “google.com/gce-pd”). Note that the CSI driver name can | ||
be different and this should be just matter of configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a different provisioner name? I thought we wanted as seamless a transition as possible.
Caveat: On downgrade the CSI Driver must still exist to continue the lifecycle | ||
of the PV’s and volumes it provisioned, the in-tree driver will not pick up | ||
where the CSI Driver left off. We could have the in-tree volume pick up where | ||
the CSI driver left off, but in that case the two provisioners race with each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a good use case for leader election between in-tree plugin and CSI plugin... only one active at a time, either able to handle storageclass->pv, pv attach/detach, etc, and there's a way to tell if the other is active
With @smarterclayton and @liggitt comments about provisioning, we could make external provisioners that provisions PVs with "old" volume source (say We need to write external provisioners for each volume plugin then. Or keep the in-tree ones forever. |
@jsafrane I got the same impression from @smarterclayton and @liggitt's comments about dynamic provisioning. However, in that case I don't understand why we would make external provisioners that provision PV with "old" volume source. They would be a carbon copy of the in-tree ones and would be presumably deployed separately from each CSI Driver as they are not the CSI Driver Provisioner. With this it would be better to just keep the in-tree ones forever, rather than increasing complexity by just moving them out of tree and having leader election for the exact same functionality. AFAIK there is nothing to be gained by making an external-provisioner that does this for each in-tree plugin. @saad-ali this is an interesting development, as it means basically keeping the in-tree provisioners forever even after migrating the other parts... would be interested to hear your thoughts on it as well. |
Nevermind, a big benefit to having the external provisioners would be moving the CloudProvider code out of tree which is a big push happening right now. It is probably worth doing it for that reason alone. |
@jsafrane @liggitt @smarterclayton updated the "Dynamically Provisioned Volumes" section to address comments. You were correct that this is somewhat of an API break. Although the PVC->PVSource relation was never guaranteed, it may be interpreted implicitly through the PVC -> Storage Class -> PV relation and we are aware that some users may be relying on that behavior. |
@jsafrane @liggitt sorry for the long silent period, other work got pushed to higher priority. I have now been able to come back to this and rewrite the translation layer part of the design to use an external translation library at various points instead of putting translated sources into annotations. PTAL and let me know what could still use some work |
function will return an error when the translation is attempted. | ||
|
||
|
||
### Non-Dynamic Provisioned Volumes (and volumes provisioned before migration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Pre-provisioned"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM let me know when the inline design closes and we can merge this.
in-tree PV. The PV will then go through all the same steps outlined below in the | ||
"Non-Dynamic Provisioned Volumes" for the rest of the volume lifecycle. | ||
|
||
#### Leader Election |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to add logic to in tree provisioner to stand down when "migrated"
|
||
#### Kubernetes Changes | ||
Dynamically Provisioned volumes will continue to be provisioned with the in-tree | ||
`PersistentVolumeSource`. We will create an external provisioner to pick up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csi-external provisioner can pick up the volume, and have reverse translation logic to also create in-tree PV's
// TranslateToCSI takes a volume.Spec and will translate it to a | ||
// CSIPersistentVolumeSource if the translation logic for that | ||
// specific in-tree volume spec has been implemented | ||
TranslateToCSI(spec volume.Spec) (CSIPersistentVolumeSource, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also CSI to In-tree
needed
|
||
For Controller calls, we will call the CSI calls instead of the in-tree calls. | ||
When CSI specific information is needed from the PV we can call the translation | ||
library to do a translation of the PV Source to the necessary CSI Source and get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look into doing translation before calling CSI
There seems to be some overlap between https://github.com/kubernetes/community/pull/2199/files#diff-a5b9eff044138c5af66d85efc24c7cacR228 and the discussion happening here: #2273 Could folks please look at the other proposal to make sure we covered everything or aren't overlapping plans? |
name needs to be unique and A/D controller must be able to find the right | ||
VolumeAttachment when a pod is deleted (i.e. using only info in Node.Status). | ||
CSI driver in kubelet must be able to find the VolumeAttachment too to get | ||
AttachmentMetadata for NodeStage/NodePublish. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may need to be some code to be added to deal with the security of namespaced secrets that can be different between inline pod volumes and pv based volumes? (rbd secrets behave this way I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in-tree PV based volumes will translate to csi PV based volumes. and Inline in-tree to CSI in-tree. I don't think we will need to explicitly deal with the differences between the two.
2abcca3
to
ac07771
Compare
I'm wondering, rather then be an all or nothing, should it be a flag per driver? Since there are so many drivers in tree already, maybe it could take longer for some drivers to have alternative csi drivers then others? |
@kfox1111 do you mean feature flags per driver like here? https://github.com/kubernetes/community/pull/2199/files#diff-a5b9eff044138c5af66d85efc24c7cacR76 Also the translation library will have a boolean method that can return whether the translation logic for a driver has been implemented or not |
Ah. I missed that somehow. Sorry. Looks good to me. :) |
@saad-ali ready for final review/approval. Looks like the inline volumes design hasn't merged yet but it seems pretty much set, I've added a little bit of description to that section and put a link to the PR. |
Can add inline volume support, etc. in follow up PR. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add In-Tree Plugin to CSI Driver Migration Design Proposal
cc @jsafrane, @msau42, @saad-ali
This is the design proposal for the mechanism to migrate In-Tree Storage Plugins to use CSI Drivers instead. PTAL
Biggest caveats to look out for: