STRUMIENIOWANIE DANYCH W SPARKU
Bartosz Kowalik
O MNIE
Scala dev @ VirtusLab
Functional programming
fun
GitHub: bkowalik
Twitter: bkowalikpl
PYTANIA DO PUBLICZNOŚCI
Kto zna Scale?
Kto zetknął się ze
Sparkiem?
?
O CZYM TA PREZENTACJA NIE JEST
TUTORAILEM OD A DO Z
CZYM JEST SPARK?
PRZYKŁAD
RODZAJE OPERACJI
TRANSFORMACJA
map(func) reduceByKey(func, [numTasks])
filter(func) aggregateByKey(zeroValue)(seqOp, combOp,
[numTasks])
flatMap(func) sortByKey([ascending], [numTasks])
mapPartitions(func) join(otherDataset, [numTasks])
mapPartitionsWithIndex(func) cogroup(otherDataset, [numTasks])
sample(withReplacement,
fraction, seed)
cartesian(otherDataset)
union(otherDataset) pipe(command, [envVars])
intersection(otherDataset) coalesce(numPartitions)
distinct([numTasks])) repartition(numPartitions)
groupByKey([numTasks]) repartitionAndSortWithinPartitions(partition
https://spark.apache.org/docs/latest/programming­
guide.html#transformations
AKCJE
reduce(func) takeSample(withReplacement, num, [seed])
collect() takeOrdered(n, [ordering])
count() saveAsTextFile(path)
first() saveAsSequenceFile(path)
take(n) saveAsObjectFile(path)
https://spark.apache.org/docs/latest/programming­guide.html#actions
PARTYCJE A RDD
http://blog.cloudera.com/wp­content/uploads/2014/03/spark­devs1.png
PRZYKŁAD
WYKORZYSTANE KOMPONENTY
Kafka
Cassandra
Akka HTTP
ARCHITEKTURA
KOD!
4developers2016- Strumieniowanie danych w Sparku- Bartosz Kowalik
MONITORING
MONITORING
4developers2016- Strumieniowanie danych w Sparku- Bartosz Kowalik
CZEGO NIE POKAZAŁEM
akumulatory
klastrowanie YARN, Mesos, etc.
Twierdzenie CAP
POLECANE KSIĄŻKI
Learning Spark: Lightning­Fast Big Data Analysis
Advanced Analytics with Spark: Patterns for Learning
from Data at Scale
ŹRÓDŁA WIEDZY ONLINE
Apache Spark
SO tag: apache­spark
YouTube: Apache Spark
Mailing lista użytkowników

More Related Content

PDF
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
PPT
Clojure
PDF
Polyglot Persistence
DOCX
Java loops
PDF
Blazing Fast Feedback Loops in the Java Universe
PDF
Exploiting GPUs in Spark
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Storytelling For The Web: Integrate Storytelling in your Design Process
2024 Trend Updates: What Really Works In SEO & Content Marketing
Clojure
Polyglot Persistence
Java loops
Blazing Fast Feedback Loops in the Java Universe
Exploiting GPUs in Spark

Recently uploaded (20)

PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Auditboard EB SOX Playbook 2023 edition.
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Co-training pseudo-labeling for text classification with support vector machi...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Electrocardiogram sequences data analytics and classification using unsupervi...
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
A symptom-driven medical diagnosis support model based on machine learning te...
MuleSoft-Compete-Deck for midddleware integrations
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Connector Corner: Transform Unstructured Documents with Agentic Automation
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Convolutional neural network based encoder-decoder for efficient real-time ob...
Enhancing plagiarism detection using data pre-processing and machine learning...
Data Virtualization in Action: Scaling APIs and Apps with FME
Advancing precision in air quality forecasting through machine learning integ...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
EIS-Webinar-Regulated-Industries-2025-08.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
Ad
Ad

4developers2016- Strumieniowanie danych w Sparku- Bartosz Kowalik