Class/Object
com.mongodb.spark.rdd.partitioner
MongoSamplePartitioner
Related Docs: object MongoSamplePartitioner | package partitioner
class MongoSamplePartitioner extends Logging with MongoPartitioner
The Sample Partitioner.
Uses the average document size and random sampling of the collection to determine suitable partitions for the collection.
Configuration Properties
The prefix when using sparkConf is: spark.mongodb.input.partitionerOptions followed by the property name:
partitionKey, the field to partition the collection by. The field should be indexed and contain unique values. Defaults to _id.
partitionSizeMB, the size (in MB) for each partition. Defaults to 64.
samplesPerPartition, the number of samples for each partition. Defaults to 10.
Note: Requires MongoDB 3.2+ Note: Does not support views. Use MongoPaginateByCountPartitioner or create a custom partitioner.
Since
1.0
Linear Supertypes
Known Subclasses
Ordering
Alphabetic
By Inheritance
Inherited
MongoSamplePartitioner
MongoPartitioner
Serializable
Serializable
Logging
LoggingTrait
AnyRef
Any
Hide All
Show All
Visibility
Public
All
Instance Constructors
new MongoSamplePartitioner()
Value Members
val partitionKeyProperty: String
The partition key property
val partitionSizeMBProperty: String
The partition size MB property
def partitions(connector: MongoConnector, readConfig: ReadConfig, pipeline: Array[BsonDocument]): Array[MongoPartition]
Calculate the Partitions
val samplesPerPartitionProperty: String
The number of samples for each partition