SlideShare a Scribd company logo
Data Source API V2
Wenchen Fan
2018-6-6 | SF | Spark + AI Summit
1
Databricks’ Unified Analytics Platform
DATABRICKS RUNTIME
COLLABORATIVE NOTEBOOKS
Delta SQL Streaming
Powered by
Data Engineers Data Scientists
CLOUD NATIVE SERVICE
Unifies Data Engineers
and Data Scientists
Unifies Data and AI
Technologies
Eliminates infrastructure
complexity
What is Data Source API?
3
What is Data Source API?
• Hadoop: InputFormat/OutputFormat
• Hive: Serde
• Presto: Connector
…….
Defines how to read/write data from/to a storage system.
4
Ancient Age: Custom RDD
HadoopRDD/CassandraRDD/HBaseRDD/…
rdd.mapPartitions { it =>
// custom logic to write to external storage
}
Good in the ancient ages when users writing Spark applications
with RDD API.
5
New Requirements When
Switching to Spark SQL
6 6
How to read data?
• How to read data concurrently and distributedly? (RDD only
satisfy this)
• How to skip reading data by filters?
• How to speed up certain operations? (aggregate, limit, etc.)
• How to convert data using Spark’s data encoding?
• How to report extra information to Spark? (data statistics,
data partitioning, etc.)
• Structured Streaming Support
…….
7
How to write data?
• How to write data concurrently and distributedly? (RDD only
satisfy this)
• How to make the write operation atomic?
• How to clean up if write failed?
• How to convert data using Spark’s data encoding?
• Structured streaming support
…….
8
Data Source API V1 for
Spark SQL
9 9
Data Source API V1
10
Data Source API V1
Pros:
• Simple
• Works well for the most common cases
11
Data Source API V1
Cons:
• Coupled with other APIs. (SQLContext, RDD, DataFrame)
12
Data Source API V1
13
Data Source API V1
Cons:
• Coupled with other APIs. (SQLContext, RDD, DataFrame)
• Hard to push down other operators.
14
Data Source API V1
15
Data Source API V1
16
buildScan(limit)
buildScan(limit, requiredCols)
buildScan(limit, filters)
buildScan(limit, requiredCols, filters)
...
Data Source API V1
Cons:
• Coupled with other APIs. (SQLContext, RDD, DataFrame)
• Hard to push down other operators.
• Hard to add different data encoding. (columnar scan)
17
Data Source API V1
18
Data Source API V1
Cons:
• Coupled with other APIs. (SQLContext, RDD, DataFrame)
• Hard to push down other operators.
• Hard to add different data encoding. (columnar scan)
• Hard to implement writing.
19
Data Source API V1
20
Data Source API V1
Cons:
• Coupled with other APIs. (SQLContext, RDD, DataFrame)
• Hard to push down other operators.
• Hard to add different data encoding. (columnar scan)
• Hard to implement writing.
• No streaming support
21
How to read data?
• How to read data concurrently and distributedly?
• How to skip reading data by filters?
• How to speed up certain operations?
• How to convert data using Spark’s data encoding?
• How to report extra information to Spark?
• Structured streaming support
22
How to write data?
• How to write data concurrently and distributedly?
• How to make the write operation atomic?
• How to clean up if write failed?
• How to convert data using Spark’s data encoding?
• Structured streaming support
23
What’s the design of Data
Source API V2?
2424
API Sketch (read)
25
API Sketch (read)
26
Like RDD
27
Easy to extend
28
Easy to extend
Read Process
29
Spark Driver
External Storage Spark Executors
Read Process
30
1. a query plan generated by user
2. the leaf data scan node generates
DataSourceReader
Spark Driver
External Storage Spark Executors
API Sketch (read)
31
Read Process
32
Spark Driver
External Storage Spark Executors
DataSourceReader:
1. connect to the external storage
2. push down operators
3. generate InputPartitions.
API Sketch (read)
33
Read Process
34
Spark Driver
External Storage Spark Executors
InputPartition:
Carries necessary
information to create a
reader at executor side.
API Sketch (read)
35
Read Process
36
Spark Driver
External Storage Spark Executors
InputPartitionReader:
talks to the external
storage and fetch the data.
API Sketch (read)
37
API Sketch (write)
38
Write Process
39
Spark Driver
External Storage Spark Executors
Write Process
40
Spark Driver
External Storage Spark Executors
1. a query plan generated by user
2. root data write node generates
DataSourceWriter
API Sketch (write)
41
Write Process
42
Spark Driver
External Storage Spark Executors
DataSourceWriter:
1. connect to the external storage
2. prepare to write. (WAL, lock, etc.)
2. generate a DataWriterFactory
API Sketch (write)
43
Write Process
44
Spark Driver
External Storage Spark Executors
DataWriterFactory:
Carries necessary
information to create a
writer to write the data.
API Sketch (write)
45
Write Process
46
Spark Driver
External Storage Spark Executors
DataWriter:
talks to the external
storage and write the data.
API Sketch (write)
47
Write Process
48
Spark Driver
External Storage Spark Executors
DataWriter:
succeed, commit this
task and send a message
to the driver.
CommitMessage
commit
API Sketch (write)
49
Write Process
50
Spark Driver
External Storage Spark Executors
Exception
DataWriter:
fail, abort this task.
Propagate exception
to driver.
abort and clean up
API Sketch (write)
51
Write Process
52
Spark Driver
External Storage Spark Executors
DataSourceWriter:
all writers succeed, commit
the job.
commit
API Sketch (write)
53
Write Process
54
Spark Driver
External Storage Spark Executors
DataSourceWriter:
some writers fail, abort the
job. (all or nothing)
abort and
clean up
API Sketch (write)
55
Streaming Data Source API V2
Structured Streaming Deep Dive:
https://tinyurl.com/y9bze7ae
Continuous Processing in Structured Streaming:
https://tinyurl.com/ydbdhxbz
56
Ongoing Improvements
• Catalog Supports: standardize the DDL logical plans, proxy
DDL commands to data source, integrate data source catalog.
(SPARK-24252)
• More operator pushdown: limit pushdown, aggregate
pushdown, join pushdown, etc. (SPARK-22388, SPARK-22390,
SPARK-24130, ...)
57
Thank you
58
Wenchen Fan (wenchen@databricks.com)
Apache Spark Data Source V2 :
Example
Gengliang Wang
Spark Summit 2018, SF
1
About me
• Gengliang Wang (Github: gengliangwang)
• Software Engineer at
Databricks’ Unified Analytics Platform
DATABRICKS RUNTIME
COLLABORATIVE NOTEBOOKS
Delta SQL Streaming
Powered by
Data Engineers Data Scientists
CLOUD NATIVE SERVICE
Unifies Data Engineers
and Data Scientists
Unifies Data and AI
Technologies
Eliminates infrastructure
complexity
About this talk
• Part II of Apache Data Source V2 session.
• See Wenchen’s talk for background and design
details.
• How to implement Parquet data source with the
V2 API
4
5
Spark
Data Source V2
We are migrating...
Read Parquet files
6 6
Query example
trainingData = spark.read.parquet("/data/events")
.where("city = 'San Francisco' and year = 2018")
.select("timestamp").collect()
7
Goal
• Understand data and skip unneeded data
• Split file into partitions for parallel read
8
ref: http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/
Parquet 101
9ref: Understanding how Parquet integrates with Avro, Thrift and Protocol Buffers
Data layout
10
Events year=2018
year=2017
year=2016
year=2015
parquet
files
parquet file
row group 0
city
timestamp
OS
browser
other columns..
row group 1
.
.
row group N
Footer
pseudo-code
class ParquetDataSource extends DataSourceReader {
override def readSchema(): StructType = {
fileIndex
.listFiles()
.map(readSchemaInFooter)
.reduce(mergeSchema)
}
}
11
Prune partition columns
12
Events year=2018
year=2017
year=2016
year=2015
parquet
files
parquet file
row group 0
city
timestamp
OS
browser
other columns..
row group 1
.
.
row group N
Footer
spark
.read
.parquet("/data/events")
.where("city = 'San Francisco' and
year = 2018")
.select("timestamp").collect()
Skip row groups
13
Events year=2018
year=2017
year=2016
year=2015
parquet
files
parquet file
row group 0
city
timestamp
OS
browser
other columns..
row group 1
.
.
row group N
Footer
spark
.read
.parquet("/data/events")
.where("city = 'San Francisco' and
year = 2018")
.select("timestamp").collect()
pseudo-code
class ParquetDataSource extends DataSourceReader with SupportsPushDownFilters {
override def pushFilters(filters: Array[Filter]): Array[Filter] = {
(partitionFilters, dataFilters) =
filters.span(_.outputSet.subsetOf(partitionColumns))
dataFilters
}
}
// For the selected row groups, we still need to evaluate data filters in Spark
// To be continued in #planInputPartitions
14
Prune columns
15
Events year=2018
year=2017
year=2016
year=2015
parquet
files
parquet file
row group 0
city
timestamp
OS
browser
other columns..
row group 1
.
.
row group N
Footer
spark
.read
.parquet("/data/events")
.where("city = 'San Francisco' and
year = 2018")
.select("timestamp").collect()
pseudo-code
class ParquetDataSource extends DataSourceReader with SupportsPushDownFilters
with SupportsPushDownRequiredColumns {
var requiredSchema: StructType = _
override def pruneColumns(requiredSchema: StructType): Unit = {
this.requiredSchema = requiredSchema
}
}
// To be continued in #planInputPartitions
16
Goal
• Understand data and skip unneeded data
• Split files into partitions for parallel read
17
Partitions of same size
18
File 0 File 1
Partition 0 Partition 1 Partition 2
File 2HDFS
Spark
Driver: plan input partitions
19
Spark
Driver
Partition 0 Partition 1 Partition 2
1. Split into
partitions
Driver: plan input partitions
20
Spark
Driver
Executor 0 Executor 1 Executor 2
1. Split into
partitions
2. Launch read tasks
Partition 0 Partition 1 Partition 2
Executor: Read distributedly
21
Spark
Driver
Executor 0 Executor 1 Executor 2
3. Actual
Reading
Partition 0 Partition 1 Partition 2
1. Split into
partitions
2. Launch read tasks
pseudo-code
class ParquetDataSource extends DataSourceReader with SupportsPushDownFilters
with SupportsPushDownRequiredColumns {
override def planInputPartitions(): List[InputPartition[Row]] = {
val filePartitions = makeFilePartitions(fileIndex.listFiles(partitionFilters))
filePartitions.map { filePartition =>
// Row group skipping
ParquetInputFormat.setFilterPredicate(hadoopConf, dataFilters)
// Read requested columns from parquet file to Spark rows
ParquetReader(filePartition, requiredSchema)
}
}
22
Summary
• Basic
• determine schema
• plan input partitions
• Mixins for optimization
• push down filters
• push down required columns
• scan columnar data
• ...
23
Parquet Writer on HDFS
2424
Query example
data = spark.read.parquet("/data/events")
.where("city = 'San Francisco' and year = 2018")
.select("timestamp")
data.write.parquet("/data/results")
25
Goal
• Parallel
• Transactional
26
27
Executor 0
Executor 1
Executor 2
1. Write task
Spark
Driver
28
part-00000Executor 0
Executor 1
Executor 2
1. Write task
2. write
to files
Spark
Driver
Each task
writes to
different
temporary
paths
part-00001
part-00002
Everything should be temporary
29
results _temporary
Files should be isolated between jobs
30
results _temporary job id
job id
Task output is also temporary
results _temporary job id _temporary
Files should be isolated between tasks
32
results _temporary job id _temporary task
attempt id
parquet
files
task
attempt id
parquet
files
task
attempt id
parquet
files
Commit task
33
Executor 0
Executor 1
Executor 2
1. Write task
3. commit
task
Spark
Driver
part-00000
2. write to file
part-00001
part-00002
File layout
34
results _temporary job id task
attempt id
parquet
files
task id parquet
files
task id parquet
files
_temporary
In
progress
Committed
3. abort task
If task aborts..
35
Executor 0
Executor 1
Executor 2
1. Write task
Spark
Driver
part-00000
2. write to file
part-00001
part-00002
File layout
36
results _temporary job id task
attempt id
parquet
files
task id parquet
files
task id parquet
files
_temporary
On task abort,
delete the task
output path
Relaunch task
37
Executor 0
Executor 1
Executor 2
1. Write task
3. abort task
Spark
Driver
part-00000
2. write to file
part-00001
part-00002
4. Relaunch
task
Distributed and Transactional Write
38
Executor 0
Executor 1
Executor 2
1. Write task
3. commit task
Spark
Driver
4. commit
job
part-00000
2. write to file
part-00001
part-00002
File layout
39
results
parquet
files
parquet
files
parquet
files
Almost transactional
40
Spark stages
output
files to a
temporary
location
Commit?
Move to final
locations
Abort; Delete
staged files
The window of
failure is small
See Eric Liang’s talk in Spark summit 2017
pseudo-code
4141
class ParquetDataSource extends DataSourceWriter with SupportsWriteInternalRow {
override def createInternalRowWriterFactory(): DataWriterFactory[InternalRow] = {
val parquetOutputFactory = ParquetOutputFactory(dataSchema, partitionSchema)
ParquetWriterFactory(this.outputPath, parquetOutputFactory)
}
override def commit(messages: Array[WriterCommitMessage]): Unit = {
committedTaskPaths.foreach { taskPath =>
mergePath(taskPath, this.outputPath)
}
}
override def abort(messages: Array[WriterCommitMessage]): Unit = {
fs.delete(pendingJobAttemptsPath)
}
}
42
class ParquetWriterFactory(
path: Path,
outputFactory: ParquetOutputFactory)
extends DataWriterFactory[InternalRow] {
override def createDataWriter(
partitionId: Int,
attemptNumber: Int,
epochId: Long): DataWriter[InternalRow] = {
val writer = outputFactory.newInstance()
ParquetWriter(writer, partitionId, attemptNumber)
}
}
43
class ParquetWriter(writer: ParquetOutputWriter, partitionId: Int, attemptNumber: Int)
extends DataWriter[InternalRow] {
val pendingPath = new pendingTaskAttemptPath(partitionId, attemptNumber)
override def write(record: InternalRow): Unit = {
writer.write(pendingPath)
}
override def commit(): WriterCommitMessage = {
mergePath(pendingPath, pendingJobAttemptsPath)
}
override def abort(): Unit = {
fs.delete(pendingPath)
}
} 44
Thank you
45
Gengliang Wang (gengliang.wang@databricks.com)

More Related Content

What's hot (20)

PDF
Understanding Query Plans and Spark UIs
Databricks
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
Dynamic Partition Pruning in Apache Spark
Databricks
 
PPTX
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
PDF
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
PDF
Memory Management in Apache Spark
Databricks
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Physical Plans in Spark SQL
Databricks
 
PDF
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
PPTX
Apache Spark Fundamentals
Zahra Eskandari
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
The Apache Spark File Format Ecosystem
Databricks
 
PDF
Deep Dive: Memory Management in Apache Spark
Databricks
 
PDF
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
PDF
Introduction to apache spark
Aakashdata
 
PPTX
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Understanding Query Plans and Spark UIs
Databricks
 
Parquet performance tuning: the missing guide
Ryan Blue
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Dynamic Partition Pruning in Apache Spark
Databricks
 
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Memory Management in Apache Spark
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Physical Plans in Spark SQL
Databricks
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Apache Spark Fundamentals
Zahra Eskandari
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
The Apache Spark File Format Ecosystem
Databricks
 
Deep Dive: Memory Management in Apache Spark
Databricks
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
Introduction to apache spark
Aakashdata
 
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 

Similar to Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang (20)

PDF
Introduction to Datasource V2 API
datamantra
 
PPTX
ApacheCon North America 2018: Creating Spark Data Sources
Jayesh Thakrar
 
PDF
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Databricks
 
PDF
Data Source API in Spark
Databricks
 
PPTX
Building a modern Application with DataFrames
Spark Summit
 
PPTX
Building a modern Application with DataFrames
Databricks
 
PPTX
Azure Databricks is Easier Than You Think
Ike Ellis
 
PDF
Introduction to Spark Training
Spark Summit
 
PPTX
Intro to Spark development
Spark Summit
 
PDF
Apache spark 2.4 and beyond
Xiao Li
 
PDF
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 
PDF
Understanding transactional writes in datasource v2
datamantra
 
PPTX
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
PPTX
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
PDF
Introduction to spark 2.0
datamantra
 
PPTX
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
PDF
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
PDF
IoT Applications and Patterns using Apache Spark & Apache Bahir
Luciano Resende
 
PPTX
Apache Spark
masifqadri
 
PPTX
APACHE SPARK.pptx
DeepaThirumurugan
 
Introduction to Datasource V2 API
datamantra
 
ApacheCon North America 2018: Creating Spark Data Sources
Jayesh Thakrar
 
Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)
Databricks
 
Data Source API in Spark
Databricks
 
Building a modern Application with DataFrames
Spark Summit
 
Building a modern Application with DataFrames
Databricks
 
Azure Databricks is Easier Than You Think
Ike Ellis
 
Introduction to Spark Training
Spark Summit
 
Intro to Spark development
Spark Summit
 
Apache spark 2.4 and beyond
Xiao Li
 
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 
Understanding transactional writes in datasource v2
datamantra
 
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
Introduction to spark 2.0
datamantra
 
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
Luciano Resende
 
Apache Spark
masifqadri
 
APACHE SPARK.pptx
DeepaThirumurugan
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Research Methodology Overview Introduction
ayeshagul29594
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 

Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang