- Data science domains like statistics, natural language processing, predictive analytics, and visualization have entered the market, while image processing, internet of things, and artificial intelligence are still in exploration.
- The "3 V's of BIG DATA" are volume, variety, and velocity.
- Popular programming languages for data science include R, Python, and SQL.
- Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. The core Hadoop modules are Hadoop Common, HDFS, YARN, and MapReduce.
- A sample data science methodology includes defining a problem statement, choosing an appropriate machine learning algorithm, running models/analysis in R/Python