From Notebook to Production: What Most ML Tutorials Don’t Teach

From Notebook to Production: What
Most ML Tutorials Don’t Teach
Vivek Bharti - Sr. MLE @ Roku
Guest Lecture at NYU Computer Science – Spring 2025

Agenda
● Intro to ML
● ML Case Study
● ML Development Lifecycle
● MLOps
● QnA

What is Machine Learning
● Machine Learning (ML) is a method of teaching computers to learn patterns from data and make decisions or
predictions without being explicitly programmed.
● Types of ML:
○ Supervised – Learn from labeled examples (e.g. spam detection)
○ Unsupervised – Discover patterns without labels (e.g. customer clustering)
○ Reinforcement – Learn by interacting and getting rewarded (e.g. game-playing AIs)

Why is ML Important?
Real-World Applications:
● Product Recommendations (Amazon, Netflix or other streaming platforms)
● Fraud Detection (banking, credit cards)
● Self-driving Cars (Tesla, Waymo)
● Chatbots & Language Models
● Disease Diagnosis (medical imaging)
Why It’s Growing:
● Explosion of data
● Affordable compute power
● Open-source ML tools (TensorFlow, PyTorch, scikit-learn)
Key Takeaway:
ML is transforming every industry — learning how it works is essential for modern software engineers.

Real-World Problem – Classifying Emails as
Spam or Not Spam
Goal: Build a model that can classify incoming emails as "Spam" or "Not Spam".
● Input (Features):
○ Email subject
○ Body content
○ Sender address
○ Keywords (e.g., “free”, “win”, “offer”)
● Output (Label):
○ 1 = Spam
0 = Not Spam

Classifying Emails as Spam or Not Spam -
Training the Model
● Dataset: Thousands of emails labeled by humans (spam vs not spam)
● Steps:
○ Text Preprocessing – clean & tokenize text
○ Feature Extraction – e.g., TF-IDF or embeddings
○ Model Selection – e.g., Logistic Regression, Decision Tree, or a simple Neural Net
○ Training – Feed data into model and adjust weights to minimize error
● Learning Objective: Find a function f(email) → {0,1}

Classifying Emails as Spam or Not Spam -
Evaluation and Deployment
● Evaluation Metrics:
○ Accuracy
○ Precision / Recall (important for spam)
○ Confusion Matrix
● Once the model is good enough:
○ Deploy into an email server
○ Continuously monitor performance (concept drift)
○ Retrain as new types of spam emerge
Note: There are many other important concepts like overfitting, underfitting, bias variance tradeoff etc

ML Development Lifecycle
Now that we’ve seen the model training, let’s explore how we can scale and maintain this model in the ML lifecycle.
● Data Collection & Preparation
○ Continuously collect new email data (including labeled spam vs not spam)
○ Feature engineering for better classification
● Model Development & Training
○ Experimentation with different models and hyperparameters
● Model Evaluation & Validation
○ Cross-validation, Hyperparameter tuning
○ Continuous evaluation against a validation set to ensure quality

ML Development Lifecycle
● Deployment
○ Integrating the model with the production email system - model pickle file, build API
○ Use containerized environments (Docker) for portability
● Monitoring & Maintenance
○ Monitoring: Performance tracking, model drift detection
○ Feedback loops: Real-time feedback for improving the model
Note: AB tests are done before model deployment to track the performance of new models

ML Development Lifecycle - Challenges
● How do we scale these processes for continuous updates? - models might need continuous updating, keep track of
experiments, if done manually it is very inefficient
● How do we ensure that the model stays accurate over time and adapts to new patterns? - Monitoring, drift detection,
feedback loops, retraining
● How do we deploy and monitor the model consistently across environments? - we need to keep the environment
consistent
To Implement the ML Development Lifecycle, and overcome all these challenges, We Need MLOps

What is MLOps
MLOps (Machine Learning Operations) is the DevOps-inspired discipline that streamlines the development, deployment,
and lifecycle management of machine learning models.
Why MLOps is Needed:
● ML projects aren’t just code — they involve data, models, experiments, metrics, and retraining loops
● Models decay over time — retraining and monitoring are critical (drift, ex: new spams will keep on coming)
● Collaboration across PMs, UI/UX Engineer, data scientists, engineers, and DevOps teams needs clear processes
Core Goals:
● Automate the ML lifecycle
● Ensure reproducibility and traceability
● Enable continuous delivery (CI/CD) for ML
● Monitor and maintain models in production in real time

MLOps in Action - MLOps Tools
CI/CD Pipelines
● Automate model training, testing, validation, and deployment, CI/CD
● Tools: GitLab CI, Jenkins, GitHub Actions, Kubeflow Pipelines
Model & Data Versioning
● Track model iterations, datasets, code, and performance metrics
● Tools: MLflow, DVC, Weights & Biases

MLOps in Action - MLOps Tools
Model Serving
● Package models into APIs for real-time or batch use
● Tools: Flask, FastAPI, BentoML, TorchServe
Monitoring & Logging
● Real time - track performance, data drift, latency, and errors in production
● Tools: Prometheus, Grafana, ELK stack, Arize, WhyLabs
Scheduled Retraining
● Keep models up-to-date with new data
● Tools: Apache Airflow, Prefect
● Trigger retraining on schedule or on drift detection

MLOps in Action - Team Practices Supporting
MLOps
Agile Development:
● Daily standups, weekly sprints, retrospectives
Kanban Boards:
● Track tasks, model experiments, bugs using Jira or Trello
Collaboration:
● Clear handoffs between PMs, UI/UX engineers, data scientists, ML engineers, and DevOps — coordinated via tools like
Slack, email, and Jira.

Key Takeaways and QnA
● ML touches everyone’s lives in today’s world
● The ML Lifecycle is Continuous:
○ From data collection to model deployment, it’s an ongoing process that requires constant updates and
iterations.
● MLOps Enforces Automation & Collaboration:
○ The MLOps pipeline automates critical aspects like model retraining, monitoring, and scaling while
fostering collaboration between teams. Make processes more efficient and less manual.
● Agile & Tools Drive Efficiency:
○ Practices like Agile sprints, Kanban boards, and tools like Jira and Trello keep tasks organized and ensure
timely updates.
Remember, building a machine learning model is only the beginning. Keeping that model effective and scalable over
time is where the actual value lies.

Thank You!
Connect on LinkedIn: www.linkedin.com/in/vivek-bharti

From Notebook to Production: What Most ML Tutorials Don’t Teach

More Related Content

Similar to From Notebook to Production: What Most ML Tutorials Don’t Teach (20)

Recently uploaded (20)

From Notebook to Production: What Most ML Tutorials Don’t Teach

Editor's Notes