SlideShare a Scribd company logo
Hadoop Security @ Comcast
Dushyanth Vaddi
Ray Harrison
About Comcast & Big Data
“Comcast brings together the best in media and technology. We
drive innovation to create the world's best entertainment and
online experiences.”
• 28 million customer relationships across 5 lines of business
• Video
• High Speed Internet
• Telephony
• Home Security & Automation
• Mobile
• $80B in Revenue (2016)
• Some Big Data use cases
• Ginormous numbers of customer devices
• Cable modems + everything behind the modems such as iPhones, tablets, IoT devices
• Set-top boxes
• Clickstream (video and internet)
• Network equipment & backbone
• Security
• Content delivery
About our Environment
Evolution of Hadoop in the Enterprise
All images: Creative Commons
Evolution of Hadoop in the Enterprise
All images: Creative Commons
Evolution of Hadoop in the Enterprise
All images: Creative Commons
Evolution of Hadoop in the Enterprise
After your first
security audit…
All images: Creative Commons
The Cost of Security & Data Breaches
• Average total cost of a data breach: $4 million
• Average cost per stolen record: $158.00
• Caused mostly by: Hackers & criminal insiders
Impacts:
• Direct monetary loss
• Loss of existing customers
• Loss of potential customers
• Servicing each stolen record
• Stock price degradation
• Brand & reputation degradation
• Law suits
Big Examples:
• Netflix
• AOL
• Target
• Sony
All images: Creative Commons
The Cost of Security & Data Breaches
Big Data = Big, Very Expensive Problem
Hadoop Security Challenges
• Hadoop security model maturity
– Relatively recent focus
– Third-party vendor add-ons complicate rather than compliment
• Complex
– Kerberos is complex, may not mix well with existing corporate standards
– Moving parts in authentication
• Kerberos RPC authentication for applications and Hadoop Services
• HTTP SPNEGO authentication for web consoles
• The use of delegation tokens, block tokens, and job tokens
– Multiple network encryption components
– Easy to make mistakes
• Staff experience levels needed to implement
• Existing corporate security policies
• Corporate politics
Upcoming Hadoop Changes
• https://issues.apache.org/jira/browse/HADOOP-9331 - Crypto
components for encryption of data at rest
• https://issues.apache.org/jira/browse/HADOOP-9392 and
https://issues.apache.org/jira/browse/HADOOP-9466 - Token-
Based Authentication & Unified Authorization Framework
• Continued evolution of Apache Knox, Ranger and other Hadoop-
related security projects
Comcast: How we did it
Hadoop Security @ Comcast
Agenda
 The Comcast Hadoop security journey
 Challenges in a multi-tenant environment
 Users
 Applications
 Technical
 Security framework design and guidelines
 Working with stakeholders
 Building your DevOps plan
 Your communication plan
 Implementation
 Next steps: Apache Ranger & HDFS Data Encryption
Hadoop security journey
Research & Design
Planning
Collaboration
Initiation
Project initiation
WHAT
Educate the
stakeholders
on Hadoop
security
WHY
Customer
data, security,
Regulatory
Requirements
WHERE
System, Data,
Cluster
Gateway
WHEN WHO HOW
Kerberos,
Ranger, Knox
Define
implementation
timeline
Corporate
security, Linux,
Hadoop
DevOps,
Development
teams
All the
stakeholders agree
to the
implementation
timeline
Focus on the end
goal and user
interaction with
cluster
Identify costs
and timelines
related to
implementing
the security
Conducting
brainstorming
sessions with all
the stakeholders
Collaboration
Security Framework Design
Available security
options within the
organization
Research
which tools &
how to
implement the
security
Plan
Design the end to end
solution with minimal
disruption for the end
user
Design
Build the security
model, test & verify
Build
Finalize the security
model
Implementation
guidelines Hadoop Security
Framework
Use
AES256 bit encryption, Password
management tool.
Prioritize
Use open source tools for server
management.
Enable
SSO for users & applications.
Avoid
Selecting the easiest solution
because it is the easiest
Timeline
Keep track of the
schedule & milestones
Support
Monitoring the ticket
queue, diagnostics of the
issues
Documentation
Document all the changes for
implementing security with
3rd party applications & user
code
Guide
Guide Developers with
the code changes for
Kerberos
Training
Training the Hadoop
Admin team on Kerberos
Follow-up
Follow-up the issues even
if it seems to been solved
so the problem doesn't
come back.
Collaboration
Plan & collaborate with
the development teams
on the deployment
schedules
DevOps Team Plan
User Community plan
Analyze
Document & analyze all
the jobs and applications
that require code
changes
Business
Communicate with the
business the impact of
security & plan the
implementation
Contact
Constant communication
with the business on the
changes coming due to
security
Deliver
Co-ordinate with the
DevOps teams to
implement the code
changes
Business Challenges
 Ownership of testing applications for Kerberos changes.
 Security is seen as slowing the business deliverables.
 Reluctant to understand the technical details of security.
 Communicating the impact and how hard the security
implementation
 Cultural change from no security to totally secured.
Development Team
challenges
 Time devoted for testing.
 Competing priorities & Collaboration with DevOps.
 Quick Fix Mentality.
 Time allocated to learn.
 Lack of Knowledge to Debug.
DevOps Challenges
 Brining off-shore team up to speed.
 Testing 3rd party applications for Windows & Mac.
 Communication not reaching all stakeholders.
 Testing different versions of the 3rd party applications.
Difference with Hadoop &
RDBMS Security
 Evolving
 Complex to implement
 Too many moving parts
 Code Changes
 3rd party Application compatibility
 Matured
 Refined & well understood
Apache Ranger
 Ranger is for implementing data authorization policies.
 Kerberos is the bases before implementing Ranger.
 Work with data architects & business to define the
polices.
 Ranger offers a centralized security framework to
manage fine-grained access control across.
Data Encryption
 Data Encryption Options :
 Volume encryption
 Application level encryption
 HDFS data at rest encryption
 Data at rest encryption.
 Encryption adds overhead to cluster
 Consider what datasets are encrypted.
QUESTIONS
Dushyanth Vaddi
Dushyanth_vaddi@comcast.com
Ray Harrison
Ray_harrison@comcast.com

More Related Content

What's hot (20)

PPTX
Building a modern end-to-end open source Big Data reference application
DataWorks Summit
 
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
 
PPTX
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
PPTX
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
PDF
Hadoop & Security - Past, Present, Future
Uwe Printz
 
PPT
The Time Has Come for Big-Data-as-a-Service
BlueData, Inc.
 
PPTX
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
PPTX
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
DataWorks Summit
 
PPTX
Ranger admin dev overview
Tushar Dudhatra
 
PPTX
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
PPTX
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
DataWorks Summit
 
PPTX
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
 
PDF
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
DataWorks Summit
 
PPTX
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
DataWorks Summit
 
PPTX
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 
Building a modern end-to-end open source Big Data reference application
DataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
Saving the elephant—now, not later
DataWorks Summit
 
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
 
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
Hadoop & Security - Past, Present, Future
Uwe Printz
 
The Time Has Come for Big-Data-as-a-Service
BlueData, Inc.
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
DataWorks Summit
 
Ranger admin dev overview
Tushar Dudhatra
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
DataWorks Summit
 
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
DataWorks Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
DataWorks Summit
 
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 

Similar to Implementing Security on a Large Multi-Tenant Cluster the Right Way (20)

PDF
BigData Security - A Point of View
Karan Alang
 
PDF
Practical Hadoop Security 1st ed. Edition Lakhe
kovachvidar
 
PPTX
Open Source Security Tools for Big Data
Great Wide Open
 
PPTX
Open Source Security Tools for Big Data
Rommel Garcia
 
PPTX
Securing Hadoop in an Enterprise Context
Hellmar Becker
 
PPTX
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
PPTX
Securing Hadoop in an Enterprise Context (v2)
Hellmar Becker
 
PDF
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Madhu Akula
 
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
PPTX
ISACA Ireland Keynote 2015
Shannon Lietz
 
PDF
XA Secure | Whitepaper on data security within Hadoop
balajiganesan03
 
PDF
Big Data & Security Have Collided - What Are You Going to do About It?
EMC
 
PPTX
Is Your Hadoop Environment Secure?
Datameer
 
PPTX
Hadoop and Big Data Security
Chicago Hadoop Users Group
 
PPTX
DevSecCon KeyNote London 2015
Shannon Lietz
 
PPTX
DevSecCon Keynote
Shannon Lietz
 
PPTX
Finding Security a Home in a DevOps World
Shannon Lietz
 
PDF
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
BigDataEverywhere
 
PDF
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Cloudera, Inc.
 
PPTX
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Yahoo Developer Network
 
BigData Security - A Point of View
Karan Alang
 
Practical Hadoop Security 1st ed. Edition Lakhe
kovachvidar
 
Open Source Security Tools for Big Data
Great Wide Open
 
Open Source Security Tools for Big Data
Rommel Garcia
 
Securing Hadoop in an Enterprise Context
Hellmar Becker
 
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Securing Hadoop in an Enterprise Context (v2)
Hellmar Becker
 
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Madhu Akula
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
ISACA Ireland Keynote 2015
Shannon Lietz
 
XA Secure | Whitepaper on data security within Hadoop
balajiganesan03
 
Big Data & Security Have Collided - What Are You Going to do About It?
EMC
 
Is Your Hadoop Environment Secure?
Datameer
 
Hadoop and Big Data Security
Chicago Hadoop Users Group
 
DevSecCon KeyNote London 2015
Shannon Lietz
 
DevSecCon Keynote
Shannon Lietz
 
Finding Security a Home in a DevOps World
Shannon Lietz
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
BigDataEverywhere
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Cloudera, Inc.
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Yahoo Developer Network
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
July Patch Tuesday
Ivanti
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 

Implementing Security on a Large Multi-Tenant Cluster the Right Way

  • 1. Hadoop Security @ Comcast Dushyanth Vaddi Ray Harrison
  • 2. About Comcast & Big Data “Comcast brings together the best in media and technology. We drive innovation to create the world's best entertainment and online experiences.” • 28 million customer relationships across 5 lines of business • Video • High Speed Internet • Telephony • Home Security & Automation • Mobile • $80B in Revenue (2016) • Some Big Data use cases • Ginormous numbers of customer devices • Cable modems + everything behind the modems such as iPhones, tablets, IoT devices • Set-top boxes • Clickstream (video and internet) • Network equipment & backbone • Security • Content delivery
  • 4. Evolution of Hadoop in the Enterprise All images: Creative Commons
  • 5. Evolution of Hadoop in the Enterprise All images: Creative Commons
  • 6. Evolution of Hadoop in the Enterprise All images: Creative Commons
  • 7. Evolution of Hadoop in the Enterprise After your first security audit… All images: Creative Commons
  • 8. The Cost of Security & Data Breaches • Average total cost of a data breach: $4 million • Average cost per stolen record: $158.00 • Caused mostly by: Hackers & criminal insiders Impacts: • Direct monetary loss • Loss of existing customers • Loss of potential customers • Servicing each stolen record • Stock price degradation • Brand & reputation degradation • Law suits Big Examples: • Netflix • AOL • Target • Sony All images: Creative Commons
  • 9. The Cost of Security & Data Breaches Big Data = Big, Very Expensive Problem
  • 10. Hadoop Security Challenges • Hadoop security model maturity – Relatively recent focus – Third-party vendor add-ons complicate rather than compliment • Complex – Kerberos is complex, may not mix well with existing corporate standards – Moving parts in authentication • Kerberos RPC authentication for applications and Hadoop Services • HTTP SPNEGO authentication for web consoles • The use of delegation tokens, block tokens, and job tokens – Multiple network encryption components – Easy to make mistakes • Staff experience levels needed to implement • Existing corporate security policies • Corporate politics
  • 11. Upcoming Hadoop Changes • https://issues.apache.org/jira/browse/HADOOP-9331 - Crypto components for encryption of data at rest • https://issues.apache.org/jira/browse/HADOOP-9392 and https://issues.apache.org/jira/browse/HADOOP-9466 - Token- Based Authentication & Unified Authorization Framework • Continued evolution of Apache Knox, Ranger and other Hadoop- related security projects
  • 12. Comcast: How we did it Hadoop Security @ Comcast
  • 13. Agenda  The Comcast Hadoop security journey  Challenges in a multi-tenant environment  Users  Applications  Technical  Security framework design and guidelines  Working with stakeholders  Building your DevOps plan  Your communication plan  Implementation  Next steps: Apache Ranger & HDFS Data Encryption
  • 14. Hadoop security journey Research & Design Planning Collaboration Initiation
  • 15. Project initiation WHAT Educate the stakeholders on Hadoop security WHY Customer data, security, Regulatory Requirements WHERE System, Data, Cluster Gateway WHEN WHO HOW Kerberos, Ranger, Knox Define implementation timeline Corporate security, Linux, Hadoop DevOps, Development teams
  • 16. All the stakeholders agree to the implementation timeline Focus on the end goal and user interaction with cluster Identify costs and timelines related to implementing the security Conducting brainstorming sessions with all the stakeholders Collaboration
  • 17. Security Framework Design Available security options within the organization Research which tools & how to implement the security Plan Design the end to end solution with minimal disruption for the end user Design Build the security model, test & verify Build Finalize the security model Implementation
  • 18. guidelines Hadoop Security Framework Use AES256 bit encryption, Password management tool. Prioritize Use open source tools for server management. Enable SSO for users & applications. Avoid Selecting the easiest solution because it is the easiest
  • 19. Timeline Keep track of the schedule & milestones Support Monitoring the ticket queue, diagnostics of the issues Documentation Document all the changes for implementing security with 3rd party applications & user code Guide Guide Developers with the code changes for Kerberos Training Training the Hadoop Admin team on Kerberos Follow-up Follow-up the issues even if it seems to been solved so the problem doesn't come back. Collaboration Plan & collaborate with the development teams on the deployment schedules DevOps Team Plan
  • 20. User Community plan Analyze Document & analyze all the jobs and applications that require code changes Business Communicate with the business the impact of security & plan the implementation Contact Constant communication with the business on the changes coming due to security Deliver Co-ordinate with the DevOps teams to implement the code changes
  • 21. Business Challenges  Ownership of testing applications for Kerberos changes.  Security is seen as slowing the business deliverables.  Reluctant to understand the technical details of security.  Communicating the impact and how hard the security implementation  Cultural change from no security to totally secured.
  • 22. Development Team challenges  Time devoted for testing.  Competing priorities & Collaboration with DevOps.  Quick Fix Mentality.  Time allocated to learn.  Lack of Knowledge to Debug.
  • 23. DevOps Challenges  Brining off-shore team up to speed.  Testing 3rd party applications for Windows & Mac.  Communication not reaching all stakeholders.  Testing different versions of the 3rd party applications.
  • 24. Difference with Hadoop & RDBMS Security  Evolving  Complex to implement  Too many moving parts  Code Changes  3rd party Application compatibility  Matured  Refined & well understood
  • 25. Apache Ranger  Ranger is for implementing data authorization policies.  Kerberos is the bases before implementing Ranger.  Work with data architects & business to define the polices.  Ranger offers a centralized security framework to manage fine-grained access control across.
  • 26. Data Encryption  Data Encryption Options :  Volume encryption  Application level encryption  HDFS data at rest encryption  Data at rest encryption.  Encryption adds overhead to cluster  Consider what datasets are encrypted.