SlideShare a Scribd company logo
Extending Flink Metrics:
real-time BI atop existing streaming pipelines
Andrew Torson, Walmart Labs
2
CHI150406 Category Prioritization ...
Smart Pricing @ Walmart Labs
‱ Algorithmic: competitive analysis, economic modeling, cost/profit margin/sales/inventory data
‱ Rich ingestion: Walmart Stores, Walmart.com, Merchants, Competition, 3rd party data
‱ Large catalog size: ~ 5M 1st party catalog; ~100M 3rd party/marketplace catalog
‱ Multi-strategy: competitive leadership/revenue management/liquidation/trending/bidding
‱ Real-time: essential inputs ingestion (competition/trends/availability)
‱ Real-time: any 1P item price is refreshed within 5 minutes on all important input triggers
‱ Quasi real-time: push-rate-controlled 3P catalog price valuation
‱ Throughput control: probabilistic filtering/caching/micro-batching/streaming API/backpressure
‱ Regular batch jobs for quasi-static data: sales forecast/demand elasticity/marketplace statistics
Item data
‱Merchant app
‱Catalog
‱3P sources
Competition data
‱ Matching
‱ Crawling
‱ 3P sources
Walmart data
‱ Walmart Stores
‱ Walmart.com
‱ SCM/WMS
Smart pricing
data
‱ Forecasts
‱ Elasticity
‱ Profit margin
Pricing algo run
‱ ML
‱ Optimization
‱ Rule-based
Price push
‱ Portal push
‱ Cache refresh
‱ Merchant push
Tech Stack: Flink, Kafka, Cassandra, Redis, gRPC, Hive, ElasticSearch, Druid, PySpark, Scikit, Azkaban
3
CHI150406 Category Prioritization ...
Monitoring & BI
‱ Health-check: metrics/Graphite/Prometheus/Grafana/New Relic
‱ Reporting: Hive/Tableau
‱ Auditing: Hive/Presto
‱ BI: Druid
Going beyond Druid: real-time BI
‱ Multi-dimensional KPI monitoring & alerting: meet/beat score, stop-loss category P&L
‱ Categorized Top-K counters: merchant ‘hot-item’ search ranking, trending items
‱ Anomaly detection: stateful outlier input detectors, categorized tail statistics
‱ Price strategy selection: item-level descriptive/prescriptive time-window snapshot
If all the data is already flowing through Flink pipelines – why not
enrich them to compute real-time BI metrics as well?
‱ Pros: existing Flink pipelines scalability; data locality; incremental code decoration with metric
aspects; powerful Flink function APIs
‱ Cons: metrics dimensionality curse/abuse; Flink performance side-effects;
4
CHI150406 Category Prioritization ...
Quick tour of Flink-metrics
‱ Dropwizard-like metrics: counters/gauges/histograms/meters
‱ Metric key scoping: system-level(TM/task/operator) + custom user-level (Metric Group feature)
‱ Flink Runtime Context: allows to register operator-level metric groups via Flink RichFunction API
‱ Flink Metric Reporter: is notified of metric registry changes and micro-batches the metrics push
‱ Built-in metrics and reporters: scheduled Dropwizard reporter; Kafka metrics; IO/latency metrics
‱ Flink web-GUI: basic metric navigation & charting (no aggregates)
ĂŒ Good fit for health-checking
ĂŒ Simple and extensible
x How to define metrics logic?
x How to decorate/attach metrics to existing Flink operators?
x How to make adding more/new metrics a quick/simple exercise?
x How to handle dynamic/BI-like metric dimensions?
x How to handle metrics aggregation?
Million-dollar question: KSQL vs Flink for real-time BI?
5
CHI150406 Category Prioritization ...
Extending Flink metrics:
‱ Define metric logic: add a hierarchy of stateful Flink Metric Calculators; offer built-in calculator bundles
‱ Attach to existing Flink operators: add a hierarchy of rich Flink Metered Operator Provider decorators;
overload calls to StreamingExecutionEnvironment.addOperator() to implicitly decorate pipelines with metrics;
‱ Rapid metric development: offer metric calculator base classes, extensible with FP lambda arguments
‱ Ease of new metric set-up: programmatic and/or annotation and/or configuration
‱ Dynamic metric key dimension extraction: FP lambda to extract key; state per key; register per key
MetricCalcul
ator<>
StatefulMetric
Calculator<>
ScalarMetric
Calculator<>
VectorMetri
cCalculator
<>
MeteredFunction<T> extends RichFunction,
ResultTypeQueryable<T>, Supplier<MetricCalculatorsRegistry>
public interface MetricCalculator<T> extends
Consumer<MetricsInvokationContext<T,?>>,
Supplier<Map<List<String>, Metric>>, Serializable {
void bootstrap(RuntimeContext ctx);
}
abstract class MeteredOperatorProvider<OPER extends
Function> implements Supplier<OPER> {
protected OPER innerFunction;
protected MetricCalculatorsRegistry metricsCalculators;
protected List<String> userMetricGroupScope;
MeteredOp
eratorProvid
er<>
MeteredOper
ator1Provide
r<>
SourceOper
atorProvider
<>
SinkOperatorP
rovider<>
MeteredOper
ator2Provider
<>
MapOperato
rProvider<>
6
CHI150406 Category Prioritization ...
Basic Code Examples:
Metric calculator :
public class MetricUtils {
public static <T> MetricsCalculator<T> basicMetricsPackage() {
return new CompositeMetricsCalculator<T>()
.withCalculator(new ResettingEventCounterCalculator<T>().withScope("EventCounter"))
.withCalculator(new SimpleRateMeterCalculator<T>().withScope("RateMeter"))
.withCalculator(new SimpleTrafficHistogramCalculator<T>().withScope("TrafficHistogram"));}}
Programmatic pipeline decoration:
stream.map(MeteredMapProvider.<ItemPushRequest,Tuple2<String,String>>of(jsonMarshallerOperator,
MetricUtils.basicMetricsPackage(),"SuccessfulPricing-Map”, null).get());
Configuration pipeline decoration:
YAML: { metrics : { packages: [ core : { class: ‘com.walmartlabs.smartpricing.streaming.metrics.MetricUtils’ ,
staticmethods = [ basicMetricsPackage ] } ], pipelines: [ OptimizationPricing : { operators: [ SuccessfulPricing-Map: {
inputcalculators : [ { core : basicMetricsPackage } ] } ] } ] } }

.
Java: stream.uid(“SuccessfulPricing”).map(jsonMarshallerOperator);
Annotation pipeline decoration:
@MetricsCalculator(package=“basicMetricsPackage”, scope=“SuccessfulPricing-Map”)
private MapFunction<ItemPushRequest,Tuple2<String,String>> jsonMarshaller;


stream.map(jsonMarshallerOperator);
7
CHI150406 Category Prioritization ...
BI metrics: in-process
In-process: fits small BI dimension cardinality & domain cases
‱ just a regular operator metric extending VectorMetricsCalculator, with custom key extractor FP lambda
‱ metric keys will be extracted on the fly from data and new Metrics will be registered & reported dynamically
‱ keyed metric state will be kept per each metric key
‱ keyed metric state will be updated in-process within each operator invocation (only for the extracted key)
‱ each task operator will keep a partition of metric keys that it has observed
‱ no aggregation in Flink: BI metrics are kept & reported without aggregation as regular Metric Groups
‱ abuse of dimensionality: may bloat the TM memory and/or operator state in Flink and/or Flink metrics reporter
public static MetricsCalculator<ItemPushRequest> priceChangePackage(){
return new ResettingVectorCounterCalculator<ItemPushRequest>().withSimpleEvaluator(
(data, s) -> Arrays.asList("PriceChangeDescTotalCounter", data.getItemPricePushData().getPriceChangeDesc())
, (k, s) -> !k.get().isEmpty()? 1L: null);
}
8
CHI150406 Category Prioritization ...
BI metrics: side output
Side output: fits large BI dimension cardinality & domain cases
‱ just a regular operator metric extending VectorMetricsCalculator, with custom key extractor FP lambda
‱ metric keys will be extracted on the fly from data
‱ no new Metrics will be registered & reported dynamically
‱ Flink ProcessFunction Operator Provider will be used
‱ keyed metric state will be updated as async side effect of each operator invocation (only for the extracted key)
‱ aggregation in Flink: BI metric side-stream must be explicitly handled in Flink, with aggregation & metric push
done by another downstream Flink stage (scheduled independently, typically involves a data shuffle)
‱ abuse of dimensionality: either use a proper aggregation or a firehose sink in the downstream Flink metrics stage
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html
9
CHI150406 Category Prioritization ...
Make a proper choice of the metrics DB/aggregation tech
A final bit of advice:
Pay attention to the limits of metric dashboarding tools

More Related Content

What's hot (20)

PDF
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward
 
PDF
Tuning Flink For Robustness And Performance
Stefan Richter
 
PPTX
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
PDF
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
PDF
dA Platform Overview
Robert Metzger
 
PDF
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
PPTX
Flink Forward San Francisco 2018 keynote: Stephan Ewen - "What turns stream p...
Flink Forward
 
PPTX
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward
 
PDF
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward
 
PPTX
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward
 
PPTX
Deep Dive Series #3: Schema Validation + Structured Audit Logs
confluent
 
PDF
Monitoring Flink with Prometheus
Maximilian Bode
 
PDF
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward
 
PPTX
Flink SQL in Action
Fabian Hueske
 
PDF
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward
 
PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
PDF
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Flink Forward
 
PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
PPTX
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward
 
Tuning Flink For Robustness And Performance
Stefan Richter
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
dA Platform Overview
Robert Metzger
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
Flink Forward San Francisco 2018 keynote: Stephan Ewen - "What turns stream p...
Flink Forward
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward
 
Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...
Flink Forward
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward
 
Deep Dive Series #3: Schema Validation + Structured Audit Logs
confluent
 
Monitoring Flink with Prometheus
Maximilian Bode
 
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward
 
Flink SQL in Action
Fabian Hueske
 
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Flink Forward
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Flink Forward
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 

Similar to Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: Real-time BI atop existing Flink streaming pipelines" (20)

PDF
Uber Business Metrics Generation and Management Through Apache Flink
Wenrui Meng
 
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
PPTX
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
PDF
Near real-time anomaly detection at Lyft
markgrover
 
PDF
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
Flink Forward
 
PDF
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
PPTX
Real time monitoring of hadoop and spark workflows
Shankar Manian
 
PPTX
Flink Streaming
Gyula FĂłra
 
PDF
Flink Streaming Berlin Meetup
MĂĄrton Balassi
 
PDF
Integrating Flink with Hive - Flink Forward SF 2019
Bowen Li
 
PPTX
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Uber Business Metrics Generation and Management Through Apache Flink
Wenrui Meng
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
Near real-time anomaly detection at Lyft
markgrover
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Flink Forward
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
Real time monitoring of hadoop and spark workflows
Shankar Manian
 
Flink Streaming
Gyula FĂłra
 
Flink Streaming Berlin Meetup
MĂĄrton Balassi
 
Integrating Flink with Hive - Flink Forward SF 2019
Bowen Li
 
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Ad

Recently uploaded (20)

PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
July Patch Tuesday
Ivanti
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
July Patch Tuesday
Ivanti
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 

Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: Real-time BI atop existing Flink streaming pipelines"

  • 1. Extending Flink Metrics: real-time BI atop existing streaming pipelines Andrew Torson, Walmart Labs
  • 2. 2 CHI150406 Category Prioritization ... Smart Pricing @ Walmart Labs ‱ Algorithmic: competitive analysis, economic modeling, cost/profit margin/sales/inventory data ‱ Rich ingestion: Walmart Stores, Walmart.com, Merchants, Competition, 3rd party data ‱ Large catalog size: ~ 5M 1st party catalog; ~100M 3rd party/marketplace catalog ‱ Multi-strategy: competitive leadership/revenue management/liquidation/trending/bidding ‱ Real-time: essential inputs ingestion (competition/trends/availability) ‱ Real-time: any 1P item price is refreshed within 5 minutes on all important input triggers ‱ Quasi real-time: push-rate-controlled 3P catalog price valuation ‱ Throughput control: probabilistic filtering/caching/micro-batching/streaming API/backpressure ‱ Regular batch jobs for quasi-static data: sales forecast/demand elasticity/marketplace statistics Item data ‱Merchant app ‱Catalog ‱3P sources Competition data ‱ Matching ‱ Crawling ‱ 3P sources Walmart data ‱ Walmart Stores ‱ Walmart.com ‱ SCM/WMS Smart pricing data ‱ Forecasts ‱ Elasticity ‱ Profit margin Pricing algo run ‱ ML ‱ Optimization ‱ Rule-based Price push ‱ Portal push ‱ Cache refresh ‱ Merchant push Tech Stack: Flink, Kafka, Cassandra, Redis, gRPC, Hive, ElasticSearch, Druid, PySpark, Scikit, Azkaban
  • 3. 3 CHI150406 Category Prioritization ... Monitoring & BI ‱ Health-check: metrics/Graphite/Prometheus/Grafana/New Relic ‱ Reporting: Hive/Tableau ‱ Auditing: Hive/Presto ‱ BI: Druid Going beyond Druid: real-time BI ‱ Multi-dimensional KPI monitoring & alerting: meet/beat score, stop-loss category P&L ‱ Categorized Top-K counters: merchant ‘hot-item’ search ranking, trending items ‱ Anomaly detection: stateful outlier input detectors, categorized tail statistics ‱ Price strategy selection: item-level descriptive/prescriptive time-window snapshot If all the data is already flowing through Flink pipelines – why not enrich them to compute real-time BI metrics as well? ‱ Pros: existing Flink pipelines scalability; data locality; incremental code decoration with metric aspects; powerful Flink function APIs ‱ Cons: metrics dimensionality curse/abuse; Flink performance side-effects;
  • 4. 4 CHI150406 Category Prioritization ... Quick tour of Flink-metrics ‱ Dropwizard-like metrics: counters/gauges/histograms/meters ‱ Metric key scoping: system-level(TM/task/operator) + custom user-level (Metric Group feature) ‱ Flink Runtime Context: allows to register operator-level metric groups via Flink RichFunction API ‱ Flink Metric Reporter: is notified of metric registry changes and micro-batches the metrics push ‱ Built-in metrics and reporters: scheduled Dropwizard reporter; Kafka metrics; IO/latency metrics ‱ Flink web-GUI: basic metric navigation & charting (no aggregates) ĂŒ Good fit for health-checking ĂŒ Simple and extensible x How to define metrics logic? x How to decorate/attach metrics to existing Flink operators? x How to make adding more/new metrics a quick/simple exercise? x How to handle dynamic/BI-like metric dimensions? x How to handle metrics aggregation? Million-dollar question: KSQL vs Flink for real-time BI?
  • 5. 5 CHI150406 Category Prioritization ... Extending Flink metrics: ‱ Define metric logic: add a hierarchy of stateful Flink Metric Calculators; offer built-in calculator bundles ‱ Attach to existing Flink operators: add a hierarchy of rich Flink Metered Operator Provider decorators; overload calls to StreamingExecutionEnvironment.addOperator() to implicitly decorate pipelines with metrics; ‱ Rapid metric development: offer metric calculator base classes, extensible with FP lambda arguments ‱ Ease of new metric set-up: programmatic and/or annotation and/or configuration ‱ Dynamic metric key dimension extraction: FP lambda to extract key; state per key; register per key MetricCalcul ator<> StatefulMetric Calculator<> ScalarMetric Calculator<> VectorMetri cCalculator <> MeteredFunction<T> extends RichFunction, ResultTypeQueryable<T>, Supplier<MetricCalculatorsRegistry> public interface MetricCalculator<T> extends Consumer<MetricsInvokationContext<T,?>>, Supplier<Map<List<String>, Metric>>, Serializable { void bootstrap(RuntimeContext ctx); } abstract class MeteredOperatorProvider<OPER extends Function> implements Supplier<OPER> { protected OPER innerFunction; protected MetricCalculatorsRegistry metricsCalculators; protected List<String> userMetricGroupScope; MeteredOp eratorProvid er<> MeteredOper ator1Provide r<> SourceOper atorProvider <> SinkOperatorP rovider<> MeteredOper ator2Provider <> MapOperato rProvider<>
  • 6. 6 CHI150406 Category Prioritization ... Basic Code Examples: Metric calculator : public class MetricUtils { public static <T> MetricsCalculator<T> basicMetricsPackage() { return new CompositeMetricsCalculator<T>() .withCalculator(new ResettingEventCounterCalculator<T>().withScope("EventCounter")) .withCalculator(new SimpleRateMeterCalculator<T>().withScope("RateMeter")) .withCalculator(new SimpleTrafficHistogramCalculator<T>().withScope("TrafficHistogram"));}} Programmatic pipeline decoration: stream.map(MeteredMapProvider.<ItemPushRequest,Tuple2<String,String>>of(jsonMarshallerOperator, MetricUtils.basicMetricsPackage(),"SuccessfulPricing-Map”, null).get()); Configuration pipeline decoration: YAML: { metrics : { packages: [ core : { class: ‘com.walmartlabs.smartpricing.streaming.metrics.MetricUtils’ , staticmethods = [ basicMetricsPackage ] } ], pipelines: [ OptimizationPricing : { operators: [ SuccessfulPricing-Map: { inputcalculators : [ { core : basicMetricsPackage } ] } ] } ] } } 
. Java: stream.uid(“SuccessfulPricing”).map(jsonMarshallerOperator); Annotation pipeline decoration: @MetricsCalculator(package=“basicMetricsPackage”, scope=“SuccessfulPricing-Map”) private MapFunction<ItemPushRequest,Tuple2<String,String>> jsonMarshaller; 
 stream.map(jsonMarshallerOperator);
  • 7. 7 CHI150406 Category Prioritization ... BI metrics: in-process In-process: fits small BI dimension cardinality & domain cases ‱ just a regular operator metric extending VectorMetricsCalculator, with custom key extractor FP lambda ‱ metric keys will be extracted on the fly from data and new Metrics will be registered & reported dynamically ‱ keyed metric state will be kept per each metric key ‱ keyed metric state will be updated in-process within each operator invocation (only for the extracted key) ‱ each task operator will keep a partition of metric keys that it has observed ‱ no aggregation in Flink: BI metrics are kept & reported without aggregation as regular Metric Groups ‱ abuse of dimensionality: may bloat the TM memory and/or operator state in Flink and/or Flink metrics reporter public static MetricsCalculator<ItemPushRequest> priceChangePackage(){ return new ResettingVectorCounterCalculator<ItemPushRequest>().withSimpleEvaluator( (data, s) -> Arrays.asList("PriceChangeDescTotalCounter", data.getItemPricePushData().getPriceChangeDesc()) , (k, s) -> !k.get().isEmpty()? 1L: null); }
  • 8. 8 CHI150406 Category Prioritization ... BI metrics: side output Side output: fits large BI dimension cardinality & domain cases ‱ just a regular operator metric extending VectorMetricsCalculator, with custom key extractor FP lambda ‱ metric keys will be extracted on the fly from data ‱ no new Metrics will be registered & reported dynamically ‱ Flink ProcessFunction Operator Provider will be used ‱ keyed metric state will be updated as async side effect of each operator invocation (only for the extracted key) ‱ aggregation in Flink: BI metric side-stream must be explicitly handled in Flink, with aggregation & metric push done by another downstream Flink stage (scheduled independently, typically involves a data shuffle) ‱ abuse of dimensionality: either use a proper aggregation or a firehose sink in the downstream Flink metrics stage https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html
  • 9. 9 CHI150406 Category Prioritization ... Make a proper choice of the metrics DB/aggregation tech A final bit of advice: Pay attention to the limits of metric dashboarding tools