Stay organized with collections
Save and categorize content based on your preferences.
Create a Dataproc cluster
Requirements:
Name: The cluster name must start with a lowercase letter followed by up
to 51 lowercase letters, numbers, and hyphens, and cannot end with a hyphen.
Cluster region: You must specify a Compute Engine region for
the cluster, such as us-east1 or europe-west1, to
isolate cluster resources, such as VM instances and cluster metadata stored in
Cloud Storage, within the region.
See Available regions & zones
for information on selecting a region. You can also run the
gcloud compute regions list command to display a listing of available regions.
The command creates a cluster with default Dataproc service settings
for your master and worker virtual machine instances, disk sizes and types,
network type, region and zone where your cluster is deployed, and other cluster
settings. See the
gcloud dataproc clusters create
command for information on using command line flags to customize cluster settings.
Create a cluster with a YAML file
Run the following gcloud command to export the configuration
of an existing Dataproc cluster into a cluster.yaml
file.
Note: During the export operation, cluster-specific fields,
such as cluster name, output-only fields, and automatically applied labels are
filtered. These fields are disallowed in the imported YAML file used to create a cluster.
REST
This section shows how to create a cluster with required values and
the default configuration (1 master, 2 workers).
Before using any of the request data,
make the following replacements:
CLUSTER_NAME: cluster name
PROJECT: Google Cloud project ID
REGION: An available Compute Engine
region where the cluster will be created.
ZONE: An optional zone
within the selected region where the cluster will be created.
HTTP method and URL:
POST https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters
Open the Dataproc
Create a cluster
page in the Google Cloud console in your browser, then
click Create in the cluster on Compute engine row
in the Create a Dataproc cluster on Compute Engine page. The
Set up cluster panel is selected with fields filled in with default values. You
can select each panel and confirm or change default values to customize your cluster.
Click Create to create the cluster. The cluster name appears in
the Clusters page, and its status is updated to Running after
the cluster is provisioned. Click the cluster name to open the cluster details
page where you can examine jobs, instances, and configuration settings for your
cluster and connect to web interfaces running on your cluster.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.dataproc.v1.Cluster;importcom.google.cloud.dataproc.v1.ClusterConfig;importcom.google.cloud.dataproc.v1.ClusterControllerClient;importcom.google.cloud.dataproc.v1.ClusterControllerSettings;importcom.google.cloud.dataproc.v1.ClusterOperationMetadata;importcom.google.cloud.dataproc.v1.InstanceGroupConfig;importjava.io.IOException;importjava.util.concurrent.ExecutionException;publicclassCreateCluster{publicstaticvoidcreateCluster()throwsIOException,InterruptedException{// TODO(developer): Replace these variables before running the sample.StringprojectId="your-project-id";Stringregion="your-project-region";StringclusterName="your-cluster-name";createCluster(projectId,region,clusterName);}publicstaticvoidcreateCluster(StringprojectId,Stringregion,StringclusterName)throwsIOException,InterruptedException{StringmyEndpoint=String.format("%s-dataproc.googleapis.com:443",region);// Configure the settings for the cluster controller client.ClusterControllerSettingsclusterControllerSettings=ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();// Create a cluster controller client with the configured settings. The client only needs to be// created once and can be reused for multiple requests. Using a try-with-resources// closes the client, but this can also be done manually with the .close() method.try(ClusterControllerClientclusterControllerClient=ClusterControllerClient.create(clusterControllerSettings)){// Configure the settings for our cluster.InstanceGroupConfigmasterConfig=InstanceGroupConfig.newBuilder().setMachineTypeUri("n1-standard-2").setNumInstances(1).build();InstanceGroupConfigworkerConfig=InstanceGroupConfig.newBuilder().setMachineTypeUri("n1-standard-2").setNumInstances(2).build();ClusterConfigclusterConfig=ClusterConfig.newBuilder().setMasterConfig(masterConfig).setWorkerConfig(workerConfig).build();// Create the cluster object with the desired cluster config.Clustercluster=Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();// Create the Cloud Dataproc cluster.OperationFuture<Cluster,ClusterOperationMetadata>createClusterAsyncRequest=clusterControllerClient.createClusterAsync(projectId,region,cluster);Clusterresponse=createClusterAsyncRequest.get();// Print out a success message.System.out.printf("Cluster created successfully: %s",response.getClusterName());}catch(ExecutionExceptione){System.err.println(String.format("Error executing createCluster: %s ",e.getMessage()));}}}
constdataproc=require('@google-cloud/dataproc');// TODO(developer): Uncomment and set the following variables// projectId = 'YOUR_PROJECT_ID'// region = 'YOUR_CLUSTER_REGION'// clusterName = 'YOUR_CLUSTER_NAME'// Create a client with the endpoint set to the desired cluster regionconstclient=newdataproc.v1.ClusterControllerClient({apiEndpoint:`${region}-dataproc.googleapis.com`,projectId:projectId,});asyncfunctioncreateCluster(){// Create the cluster configconstrequest={projectId:projectId,region:region,cluster:{clusterName:clusterName,config:{masterConfig:{numInstances:1,machineTypeUri:'n1-standard-2',},workerConfig:{numInstances:2,machineTypeUri:'n1-standard-2',},},},};// Create the clusterconst[operation]=awaitclient.createCluster(request);const[response]=awaitoperation.promise();// Output a success messageconsole.log(`Cluster created successfully: ${response.clusterName}`);
fromgoogle.cloudimportdataproc_v1asdataprocdefcreate_cluster(project_id,region,cluster_name):"""This sample walks a user through creating a Cloud Dataproc cluster using the Python client library. Args: project_id (string): Project to use for creating resources. region (string): Region where the resources should live. cluster_name (string): Name to use for creating a cluster. """# Create a client with the endpoint set to the desired cluster region.cluster_client=dataproc.ClusterControllerClient(client_options={"api_endpoint":f"{region}-dataproc.googleapis.com:443"})# Create the cluster config.cluster={"project_id":project_id,"cluster_name":cluster_name,"config":{"master_config":{"num_instances":1,"machine_type_uri":"n1-standard-2"},"worker_config":{"num_instances":2,"machine_type_uri":"n1-standard-2"},},}# Create the cluster.operation=cluster_client.create_cluster(request={"project_id":project_id,"region":region,"cluster":cluster})result=operation.result()# Output a success message.print(f"Cluster created successfully: {result.cluster_name}")
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-07-02 UTC."],[[["Dataproc restricts the creation of clusters with image versions older than 1.3.95, 1.4.77, 1.5.53, and 2.0.27 to mitigate Apache Log4j security vulnerabilities, and it also blocks clusters on Dataproc image versions 0.x, 1.0.x, 1.1.x, and 1.2.x."],["It's recommended to utilize the latest sub-minor image versions for Dataproc clusters, such as 2.0.29, 1.5.55, and 1.4.79 or later, which include log4j.2.17.1, for enhanced security."],["Creating a Dataproc cluster requires a name that starts with a lowercase letter, up to 51 lowercase letters, numbers, and hyphens, and must not end with a hyphen, as well as specifying a Compute Engine region for resource isolation."],["The most common methods for creating a cluster are using `gcloud` commands, by importing a YAML configuration file, or by using the Dataproc API REST requests; the Google Cloud Console is also an option for cluster creation."],["Full internal IP networking cross connectivity is required for master and worker VMs in a Dataproc cluster, which is provided by the `default` VPC network."]]],[]]