Skip to content

Add In-Tree Plugin to CSI Driver Migration Design Proposal #2199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 26, 2018

Conversation

davidz627
Copy link
Contributor

@davidz627 davidz627 commented May 29, 2018

cc @jsafrane, @msau42, @saad-ali

This is the design proposal for the mechanism to migrate In-Tree Storage Plugins to use CSI Drivers instead. PTAL

Biggest caveats to look out for:

  • Dynamic provisioning creates CSI PV's, not in-tree plugin PV's
  • Nodes must be drained before turning feature on/off in Kubelet
  • Deprecation of in-tree plugins will be a long pull, and at some point might break people if user does not take specific action on their self-managed clusters.
  • Migration turned on/off with feature gates, may be un-configurable by users in managed environments

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 29, 2018
@k8s-ci-robot k8s-ci-robot requested review from childsb and saad-ali May 29, 2018 22:53
@k8s-ci-robot k8s-ci-robot added kind/design Categorizes issue or PR as related to design. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels May 29, 2018
@davidz627
Copy link
Contributor Author

/sig storage
/assign @msau42 @saad-ali

aware/dependent on what kind of PV ends up getting bound to their PVC (this is a
core tenet of the k8s storage system).

On Delete, the internal volume plugin will translate in-tree volume source into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above is written that "In-tree provisioner will be changed ... not register the in-tree volume plugin into PV controller". And here PV controller needs the plugin to translate spec to JSON.

IMO, it could register an "alternative plugin" that implements just the deleter interface and not provisioner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you are suggesting here, that we should still have a "partial" in-tree provisioner that is just able to clean up the volumes that it created before the migration?

annotation with CSI source.


### Non-Dynamic Provisioned Volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chapter applies to all volumes that were dynamically provisioned before the migration too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Especially CSI VolumeHandle must be carfully chosen! Volumes attached by CSI
driver will have VolumeAttachment. In-tree volume plugin should be able to
detach the volume in the usual way after downgrade, however, A/D controller
should delete the VolumeAttachment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attach/detach is more complicated.

Migration in-tree -> csi:

Rephrasing the text above, A/D controller must "re-attach" all existing volumes to all nodes, hoping for idempotency, so VolumeAttachment is filled with the right AttachmentMetadata so kubelet can mount device(s).

CSI -> in-tree:

Here I would suggest the same, A/D controller should re-attach all volumes to all nodes, so node.status.volumesAttached.devicePath is populated and kubelet can mount the devices. There is a race between A/D controller filling node.status.volumesAttached.devicePath and kubelet seeing the volume already attached (node.status.volumesAttached is present), but node.status.volumesAttached.devicePath empty (-> some error).

We must make sure that internal volume plugins are idempotent too. Especially on AWS we must make sure that the internal volume plugin re-uses the device names assigned by CSI driver - on AWS it's a client (A/D controller) who assigns device names to volumes on Attach. So the volume plugin has a table of free/used device names (incl. those who are "in flight" and not confirmed by AWS yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration in-tree -> csi:

I don't really understand this exact case, Kubelet does not need AttachmentMetadata to mount device, I think it will need node.status.volumesattached.devicePath like you mentioned in the second part though, so the reasoning is the same there.

Another problem if we Attach using in-tree, then try to Detach with CSI, we will not have the VolumeAttachment object to use to "signal" the Detach operation. In this case, I think your solution is correct that we must somehow "re-attach" using CSI to create this VolumeAttachment object, and it should be idempotent by definition of the spec (we must ensure we use same device name).

Am I misunderstanding?

CSI -> in-tree:

This one is tricky, I realized I never thought of this case because GCE PD doesn't get device paths from node.status for mount operations (it does for UnmountDevice though) but I just looked through AWS and that driver does.

The problem is that "node.status.volumesAttached.devicePath empty (-> some error)." wont be true, the devicePath will be filled with the csiAttachID. The race is a little more confusing at this point because the in-tree plugin will try to do operations using that, we will have to confirm that it will try to retry later.

So the AI to add to the design:

  • A/D Controller should check if each attached device was attached with in-tree or CSI
  • If it was attached by the "other" mode, we kick off attach again
  • CSI plugin must have idempotent attach (already true), we must make sure in-tree attach is idempotent (TBD)
  • Kubelet must retry MountDevice even if node.status.volumesAttached.devicePath is filled but incorrect

This is specifically for the case that we switch on/off migration in between Attach and MountDevice/Mount.

We will not worry about what happens when switching between various mount options, or on the detach/Unmount side for now because of the requirement for draining nodes right now. I imagine that's a whole nother can of worms.

Let me know if this makes sense to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a section about this

Copy link
Member

@jsafrane jsafrane Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in-tree -> csi:

I think we're on the same page now.

CSI -> in-tree:

So the AI to add to the design:

  • A/D Controller should check if each attached device was attached with in-tree or CSI
  • If it was attached by the "other" mode, we kick off attach again
    CSI plugin must have idempotent attach (already true), we must make sure in-tree attach is idempotent (TBD)
  • Kubelet must retry MountDevice even if node.status.volumesAttached.devicePath is filled but incorrect

Ack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will not worry about what happens when switching between various mount options, or on the detach/Unmount side for now because of the requirement for draining nodes right now. I imagine that's a whole nother can of worms.

Ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually solved by the combination of adding the NodeAnnotation when the Kubelet is migrated + the requirement to drain nodes before Kubelet Upgrade. We cannot hit these cases.

And at this point users doing their own deployment and not installing the GCE PD
CSI driver encounter an error.

## Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to test configuration skew? E.g kubelets migrated to CSI drivers and controllers using in-tree plugins? Or half of kubelets migrated and the second half not?

Copy link
Contributor Author

@davidz627 davidz627 May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we will be able to support configuration skew.
We will have this devicePath incompatibility, same issue as above (https://github.com/kubernetes/community/pull/2199/files#r192096628) except now theres no actual solution because versions are actually different.

I will add a section about disabling this feature and ignoring the feature gates if Master/Node are not both on a version that supports Migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a section about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure how it safely migrate. Admin disables the feature and updates cluster. Now all components have the right version, however, they still use in-tree plugins. What now?

  1. Admin enables the feature on masters. They will start re-attaching volumes and creating VolumeAttachment objects.
  2. For each node:
    2.1 Admin drains the node.
    2.2 Admin enables the feature gate.
    2.3 Admin starts the node.

At point 2.1, the evicted pods are "moved" to another nodes (i.e. they're killed and ReplicaSet/StatefulSet re-creates new ones somewhere else). The new node may or may not have the feature enabled. But master is already enabled, so it will go the CSI way and create VolumeAttachment and only the migrated node can understand it.

Is this reliable?

Just brainstorming here: can there be a flag (annotation) on a node that would tell if the feature (and which plugins) are migrated to CSI on the node? So A/D controller can either go the CSI way or internal plugin per node? It would be really complicated on the A/D controller side. We could make it easier if we allow to flip the feature only on drained nodes so A/D controller does not need to re-attach all volumes and create VolumeAttachments, because there is nothing to re-attach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better here to do not enable it on A/D Controller unless the lowest node verson is > X && migration enabled. This would solve the issue you bring up here. It may be possible to do in on a node by node basis but I think that would be far too complex of a problem to solve for this problem that only exists in a very transitory state (1-2 release cycles)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I also see an issue with this that the newly upgraded node will be version X && migration enabled and has no way to know (?) what versions the other nodes are on. So that node cannot make a "decision" about whether to use migration or not.

In this case the A/D controller may have migration "off" because it sees there are still nodes at version X-1. But the node at version X will turn migration "on" because of the above reasoning (unless it can tell other node versions).

I dont really follow your sentence: "We could make it easier if we allow to flip the feature only on drained nodes so A/D controller does not need to re-attach all volumes and create VolumeAttachments, because there is nothing to re-attach."

Does this solve the problem I just outlined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase my suggestion. When kubelet starts, it checks what feature gates are enabled and annotates itself with say csi.attach.kubernetes.io/gce-pd=true to tell A/D controller that it supports GCE PD through CSI.

A/D controller has the feature enabled too and before it starts looking for a volume plugin, it checks node annotation. If the node says it supports CSI for GCE-PD, it will create VolumeAttachment. If the node does not support CSI, A/D controller uses the old volume plugin to attach the volume (and set node.status.volumesAttached).

So the A/D controller must instantiate both old volume plugins and their CSI counterparts and have possibility to use both at the same time (for different nodes). This should not be that complicated, IMO.

Now, what if admin flips the feature gate in one kubelet (e.g. because it's getting updated or the admin enables the feature on each node)? There are volumes attached to the node. Here I suggest to drain the node first, i.e. unmount and detach all volumes and flip the feature gate (and restart kubelet) only when the node is completely empty. Kubelet could spit some warnings / errors when it detects a volume attached by the old volume plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I will add some steps for this in the design. I am curious would this cause a race between adding the annotation and a volume being attached. For example if A/D Controller begins attach on the node after it is restarted but before the node is annotated, it will attach with in-tree plugin and then the kubelet will be trying to mount with CSI (and probably failing).

I guess this will be reconciled by the other part we discussed doing the re-attach during the reconcile loops but it seems like a lot to reason about. I can't think of a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See new commit for added information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually reworked the entire section into ## Upgrade/Downgrade, Migrate/Un-migrate. PTAL. I believe it succinctly covers all the bases we have discussed so far. I have included an upgrade and downgrade workflow in Upgrade/Downgrade Migrate/Unmigrate Scenarios to illustrate the solution

Copy link
Member

@gnufied gnufied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a very first pass at proposal

permissions. This is necessary because the CSI external controller components
all GET the PV object and use it as a source of truth. We also foresee that
future controller calls such as resize and snapshot will also use and update the
PV as a source of truth. All these external components will also need to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When resizing for example - when user resizes a PVC and volume was originally provisioned by in-tree provisioner then I would assume external controller will update both PV spec and PV CSI spec within the annotation? It is somewhat problematic to update objects as a dependency chain because we don't have transactions..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller would ensure the annotation is updated/exists before calling CSI resize. The resize function in OperationGenerator would need to be modified to do this as well as call out to CSI instead of in-tree if the "migration" is on.

have to attempt to DELETE the VolumeAttachment object that would have been
created if that plugin was migrated. This is because Attach after migration
creates a VolumeAttachment object, and if for some reason we are doing a detach
with the in-tree plugin, the VolumeAttachment object becomes orphaned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would happen if user disables the CSI migration feature gate right? Another thing to keep in mind is - when feature gate is disabled, both controller-manager and kubelet should be able to "recover" state without relying on CSI only objects. For example, node.Status.VolumeAttached property set for CSI plugin should be compatible with in-tree plugin , in case user downgrades to in-tree plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. With this proposal the node.Status.VolumeAttached property should be the same when migrated vs. not migrated so the state should be recoverable here.

annotation in. In this case we will add the CSI Source JSON to the
VolumeToAttach object and in Attach we will put the Source in a new field in the
VolumeAttachment object VolumeAttachment.Spec.Source.InlineVolumeSource. The CSI
Attacher will have to be modified to also check this location for a source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be clarified when this VolumeAttachment will be created. I assume this will happen whenever an attach request is made and feature flag is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, It will just call the CSI Attach which creates the VolumeAttachment, there is no special code being executed here, we are just redirecting in-tree Attach to CSI Attach instead. VolumeAttachment is just an implementation detail of CSI that does not exist elsewhere.

For downgrade, starting from a fully migrated cluster you must drain your node
first, then turn off migration for your Kubelet, then turn off migration for the
A/D Controller. The workflow is as follows:
1. A/D Controller and Kubelet are both migrated
Copy link
Member

@jsafrane jsafrane Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "unschedulable" flag to kubelet to make sure that the node is annotated before it can accept first pods:

  1. A/D Controller and Kubelet are both migrated
  2. Kubelet drained and made unschedulable, all volumes unmounted/detached with CSI code
  3. Kubelet restarted and un-migrated (flags flipped)
  4. Kubelet removes node annotation to tell A/D Controller this node is not migrated. In case kubelet does not have annotation removal code, admin must remove the annotation manually.
  5. Kubelet is made schedulable.
  6. At this point all volumes going onto the node would be using in-tree code for both A/D Controller(b/c of annotation) and Kublet
  7. Restart and un-migrate A/D Controller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

For upgrade, starting from a non-migrated cluster you must turn on migration for
A/D Controller first, then drain your node before turning on migration for the
Kubelet. The workflow is as follows:
1. A/D Controller and Kubelet are both not migrated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "unschedulable" flag to kubelet to make sure that the node is annotated before it can accept first pods.

  1. A/D Controller and Kubelet are both not migrated
  2. A/D Controller restarted and migrated (flags flipped)
  3. A/D Controller continues to use in-tree code for this node b/c node
    annotation doesn't exist
  4. Node drained and made unschedulable. All volumes unmounted/detached with in-tree code
  5. Kubelet restarted and migrated (flags flipped)
  6. Kubelet annotates node to tell A/D controller this node has been migrated
  7. Kubelet is made schedulable.
  8. Both A/D Controller & Kubelet Migrated, node is in "fresh" state so all new volumes lifecycle is CSI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, done

1. A/D Controller and Kubelet are both migrated
2. Kubelet drained, all volumes unmounted/detached with CSI code
3. Kubelet restarted and un-migrated (flags flipped)
4. Kubelet removes node annotation to tell A/D Controller this node is not migrated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is very little documentation about cluster downgrade. Does it follow the same path as upgrade? Then A/D controller is downgraded first and it does not know what to do with VolumeAttachment objects.

Can we suggest / document that nodes must be downgraded (drained) first? Then new A/D controller will remove all VolumeAttachment object when draining nodes and newly started old A/D controller will see just volumes attached via internal volume plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the Requirement to downgrade nodes first is stated in the Upgrade/Downgrade scenarios section. This will need to be called out very loudly when this feature rolls-out.

I believe cluster admins in whatever form have the flexibility to downgrade however they want, there are no current constraints that I am aware of. So constraining them to downgrade nodes first does not conflict with current procedures, hoping @k8s-mirror-cluster-lifecycle-pr-reviews can shed some more light on this

@jsafrane
Copy link
Member

We need review from someone who's familiar with upgrade / downgrade process and how to enable / disable features like this on cluster scope.

cc @kubernetes/sig-cluster-lifecycle-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jun 11, 2018
@gnufied
Copy link
Member

gnufied commented Jun 12, 2018

@davidz627 @jsafrane do you guys think, it is worth splitting inline CSI volumes from in-tree migration stuff? I think, we should not tie them together and it may help us with adoption of CSI for wider use case.

@davidz627
Copy link
Contributor Author

@gnufied I have answered your question here: kubernetes/kubernetes#64984 (comment)


The provisioner will pick up the PVC, and create a PV with a CSI Source that it
would do normally. Using CSISource in the PV might not be fully “backwards
compatible” but it will be ok from a user facing behavior change perspective
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an API break. Tools that rely on the PV source would no longer be able to find that data. At best, you would have to mirror the data in the api into the CSI source and keep them in sync.

compatible” but it will be ok from a user facing behavior change perspective
because this is a generated PV anyway, we’re not changing user supplied objects.
The PV is in the domain of the storage administrator and the user should not be
aware/dependent on what kind of PV ends up getting bound to their PVC (this is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately PV is a public interface that clients can consume to perform administrative or display functionality. Just because they are automated does not mean they can be changed.


If the volume was provisioned by the in-tree provisioner but CSI Migration is
now turned on - On Delete, the internal volume plugin will translate in-tree
volume source into CSI Source and put it as an annotation to the PV if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an annotation? Is CSI source not part of the PV already?

Copy link
Member

@liggitt liggitt Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translate in-tree volume source into CSI Source and put it as an annotation to the PV

I don't think representing this information twice (and the second time as a mutable annotation) is a good idea.

since we need a fixed transformation of VolumeSource/PersistentVolumeSource -> CSI driver name + parameters, that would seem to belong in a library that can be used by all the components that need to map existing in-tree volume sources to CSI:

  • csi external-attacher (to know what CSI driver to call for an AWS/GCE PersistentVolumeSource)
  • kubelet (to know what CSI driver to call for a PersistentVolumeSource)
  • others?

`provisionClaimOperation` to set the claim provisioner to the migrated external
provisioner name, so the external provisioner will pick this up and treat it as
a CSI Volume from then on. The controller will automatically annotate PVCs for
external provisioning by an provisioner that handles e.g. “google.io/gce-pd”.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't provisioning be determined by something on the storage class, not on the PVC? I wouldn't expect any visible change on the PVC

#### External Component Changes
External provisioner will use a different provisioner name as the internal
volume plugin name (e.g. “google.com/gce-pd”). Note that the CSI driver name can
be different and this should be just matter of configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a different provisioner name? I thought we wanted as seamless a transition as possible.

Caveat: On downgrade the CSI Driver must still exist to continue the lifecycle
of the PV’s and volumes it provisioned, the in-tree driver will not pick up
where the CSI Driver left off. We could have the in-tree volume pick up where
the CSI driver left off, but in that case the two provisioners race with each
Copy link
Member

@liggitt liggitt Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a good use case for leader election between in-tree plugin and CSI plugin... only one active at a time, either able to handle storageclass->pv, pv attach/detach, etc, and there's a way to tell if the other is active

@jsafrane
Copy link
Member

With @smarterclayton and @liggitt comments about provisioning, we could make external provisioners that provisions PVs with "old" volume source (say GCEPersistentDiskVolumeSource) instead of CSIPersistentVolumeSource. CSI translation layer would then kick in and CSI would be used to attach/mount the volumes, but all PVs would be GCE PD ones.

We need to write external provisioners for each volume plugin then. Or keep the in-tree ones forever.

@davidz627
Copy link
Contributor Author

@jsafrane I got the same impression from @smarterclayton and @liggitt's comments about dynamic provisioning. However, in that case I don't understand why we would make external provisioners that provision PV with "old" volume source. They would be a carbon copy of the in-tree ones and would be presumably deployed separately from each CSI Driver as they are not the CSI Driver Provisioner.

With this it would be better to just keep the in-tree ones forever, rather than increasing complexity by just moving them out of tree and having leader election for the exact same functionality. AFAIK there is nothing to be gained by making an external-provisioner that does this for each in-tree plugin.

@saad-ali this is an interesting development, as it means basically keeping the in-tree provisioners forever even after migrating the other parts... would be interested to hear your thoughts on it as well.

@davidz627
Copy link
Contributor Author

Nevermind, a big benefit to having the external provisioners would be moving the CloudProvider code out of tree which is a big push happening right now. It is probably worth doing it for that reason alone.

@davidz627
Copy link
Contributor Author

@jsafrane @liggitt @smarterclayton updated the "Dynamically Provisioned Volumes" section to address comments. You were correct that this is somewhat of an API break. Although the PVC->PVSource relation was never guaranteed, it may be interpreted implicitly through the PVC -> Storage Class -> PV relation and we are aware that some users may be relying on that behavior.

@davidz627
Copy link
Contributor Author

@jsafrane @liggitt sorry for the long silent period, other work got pushed to higher priority. I have now been able to come back to this and rewrite the translation layer part of the design to use an external translation library at various points instead of putting translated sources into annotations. PTAL and let me know what could still use some work

function will return an error when the translation is attempted.


### Non-Dynamic Provisioned Volumes (and volumes provisioned before migration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pre-provisioned"

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM let me know when the inline design closes and we can merge this.

in-tree PV. The PV will then go through all the same steps outlined below in the
"Non-Dynamic Provisioned Volumes" for the rest of the volume lifecycle.

#### Leader Election
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to add logic to in tree provisioner to stand down when "migrated"


#### Kubernetes Changes
Dynamically Provisioned volumes will continue to be provisioned with the in-tree
`PersistentVolumeSource`. We will create an external provisioner to pick up the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csi-external provisioner can pick up the volume, and have reverse translation logic to also create in-tree PV's

// TranslateToCSI takes a volume.Spec and will translate it to a
// CSIPersistentVolumeSource if the translation logic for that
// specific in-tree volume spec has been implemented
TranslateToCSI(spec volume.Spec) (CSIPersistentVolumeSource, error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also CSI to In-tree needed


For Controller calls, we will call the CSI calls instead of the in-tree calls.
When CSI specific information is needed from the PV we can call the translation
library to do a translation of the PV Source to the necessary CSI Source and get
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look into doing translation before calling CSI

@kfox1111
Copy link

There seems to be some overlap between https://github.com/kubernetes/community/pull/2199/files#diff-a5b9eff044138c5af66d85efc24c7cacR228 and the discussion happening here: #2273

Could folks please look at the other proposal to make sure we covered everything or aren't overlapping plans?

name needs to be unique and A/D controller must be able to find the right
VolumeAttachment when a pod is deleted (i.e. using only info in Node.Status).
CSI driver in kubelet must be able to find the VolumeAttachment too to get
AttachmentMetadata for NodeStage/NodePublish.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may need to be some code to be added to deal with the security of namespaced secrets that can be different between inline pod volumes and pv based volumes? (rbd secrets behave this way I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in-tree PV based volumes will translate to csi PV based volumes. and Inline in-tree to CSI in-tree. I don't think we will need to explicitly deal with the differences between the two.

@davidz627 davidz627 force-pushed the master branch 2 times, most recently from 2abcca3 to ac07771 Compare August 21, 2018 23:12
@kfox1111
Copy link

I'm wondering, rather then be an all or nothing, should it be a flag per driver? Since there are so many drivers in tree already, maybe it could take longer for some drivers to have alternative csi drivers then others?

@davidz627
Copy link
Contributor Author

@kfox1111 do you mean feature flags per driver like here? https://github.com/kubernetes/community/pull/2199/files#diff-a5b9eff044138c5af66d85efc24c7cacR76

Also the translation library will have a boolean method that can return whether the translation logic for a driver has been implemented or not

@kfox1111
Copy link

Ah. I missed that somehow. Sorry. Looks good to me. :)

@davidz627
Copy link
Contributor Author

@saad-ali ready for final review/approval. Looks like the inline volumes design hasn't merged yet but it seems pretty much set, I've added a little bit of description to that section and put a link to the PR.

@saad-ali
Copy link
Member

Can add inline volume support, etc. in follow up PR.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2018
@k8s-ci-robot k8s-ci-robot merged commit 13c1be0 into kubernetes:master Sep 26, 2018
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
Add In-Tree Plugin to CSI Driver Migration Design Proposal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants