-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Edited by: @HumairAK
Based on the discussion below, this issue is now allowing admin users to configure what labels/annotations are added to driver pods at the KFP API Server level.
Environment
- How did you deploy Kubeflow Pipelines (KFP)?
Kubeflow manifests (kustomize) - KFP version:
2.5.0 - KFP SDK version:
2.13.0
Steps to reproduce
Enable STRICT mTLS on the entire mesh, then run a pipeline.
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: mesh-traffic
namespace: istio-system
spec:
mtls:
mode: STRICT
Expected result
Pipeline runs successfully
Materials and Reference
KFP driver pods run without Istio sidecars, therefore they can't connect to services within the mesh (if mTLS STRICT mode is enabled).
In particular, the default Kubeflow installation is shipped with MinIO object store, the driver pod fails when trying to connect to it:
level=info msg="Saving file to s3" bucket=mlpipeline endpoint="minio-service.kubeflow:9000" key=artifacts/mnist-pipeline-mdtkv/2025/06/25/mnist-pipeline-mdtkv-system-dag-driver-938794099/main.log path=/tmp/argo/outputs/logs/main.log
level=info msg="Transient error: Get \"http://minio-service.kubeflow:9000/mlpipeline/?location=\": read tcp 10.244.2.191:60656->10.108.206.88:9000: read: connection reset by peer"
Notice also the MinIO pod Istio sidecar that reports a mismatch of TLS configuration between
the downstream client (driver pod) and the upstream server (MinIO Istio sidecar):
"- - -" 0 NR filter_chain_not_found - "-" 0 0 0 - "-" "-" "-" "-" "-" - - 10.244.2.152:9000 10.244.2.191:60656 - -
Another service which is impacted is the MLDM gRPC service, the driver pod fails with error:
KFP driver: driver.RootDAG(pipelineName=mnist-pipeline, runID=xyz, runtimeConfig, componentSpec) failed: Failed GetContextByTypeAndName(type="system.Pipeline", name="mnist-pipeline"): rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: read tcp 10.244.2.191:51708->10.105.73.97:8080: read: connection reset by peer"
I could fix the above errors errors by selectively disabling mTLS:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: minio-traffic
namespace: kubeflow
spec:
selector:
matchLabels:
app: minio
portLevelMtls:
9000:
mode: PERMISSIVE
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: metadata-grpc-server-traffic
namespace: kubeflow
spec:
selector:
matchLabels:
component: metadata-grpc-server
portLevelMtls:
8080:
mode: PERMISSIVE
I wonder whether this is the right approach or there's a better solution to address mTLS communication.
Would it be an option to run driver pods within the mesh as well?
Thank you
Impacted by this bug? Give it a 👍.
Metadata
Metadata
Assignees
Type
Projects
Status