Skip to content

[backend] Add the ability to specify driver labels/annotations as a KFP api server configuration #12015

@mginfn

Description

@mginfn

Edited by: @HumairAK

Based on the discussion below, this issue is now allowing admin users to configure what labels/annotations are added to driver pods at the KFP API Server level.


Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    Kubeflow manifests (kustomize)
  • KFP version:
    2.5.0
  • KFP SDK version:
    2.13.0

Steps to reproduce

Enable STRICT mTLS on the entire mesh, then run a pipeline.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: mesh-traffic
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

Expected result

Pipeline runs successfully

Materials and Reference

KFP driver pods run without Istio sidecars, therefore they can't connect to services within the mesh (if mTLS STRICT mode is enabled).

In particular, the default Kubeflow installation is shipped with MinIO object store, the driver pod fails when trying to connect to it:

level=info msg="Saving file to s3" bucket=mlpipeline endpoint="minio-service.kubeflow:9000" key=artifacts/mnist-pipeline-mdtkv/2025/06/25/mnist-pipeline-mdtkv-system-dag-driver-938794099/main.log path=/tmp/argo/outputs/logs/main.log
level=info msg="Transient error: Get \"http://minio-service.kubeflow:9000/mlpipeline/?location=\": read tcp 10.244.2.191:60656->10.108.206.88:9000: read: connection reset by peer"

Notice also the MinIO pod Istio sidecar that reports a mismatch of TLS configuration between
the downstream client (driver pod) and the upstream server (MinIO Istio sidecar):

"- - -" 0 NR filter_chain_not_found - "-" 0 0 0 - "-" "-" "-" "-" "-" - - 10.244.2.152:9000 10.244.2.191:60656 - -

Another service which is impacted is the MLDM gRPC service, the driver pod fails with error:

KFP driver: driver.RootDAG(pipelineName=mnist-pipeline, runID=xyz, runtimeConfig, componentSpec) failed: Failed GetContextByTypeAndName(type="system.Pipeline", name="mnist-pipeline"): rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: read tcp 10.244.2.191:51708->10.105.73.97:8080: read: connection reset by peer"

I could fix the above errors errors by selectively disabling mTLS:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: minio-traffic
  namespace: kubeflow
spec:
  selector:
    matchLabels:
      app: minio
  portLevelMtls:
    9000:
      mode: PERMISSIVE
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: metadata-grpc-server-traffic
  namespace: kubeflow
spec:
  selector:
    matchLabels:
      component: metadata-grpc-server
  portLevelMtls:
    8080:
      mode: PERMISSIVE 

I wonder whether this is the right approach or there's a better solution to address mTLS communication.
Would it be an option to run driver pods within the mesh as well?
Thank you


Impacted by this bug? Give it a 👍.

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Triaged

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions