From the course: Transition from Data Science to Data Engineering
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Using data to problem-solve
From the course: Transition from Data Science to Data Engineering
Using data to problem-solve
- [Instructor] Focus on problem solving, an essential skill for data engineers, because we are responsible for collecting, cleaning, storing, and processing large amounts of data. Data could be available from a variety of sources in multiple formats and with a wide range of quality issues. Data engineers must emphasize identifying and solving problems with this data in order to make it useful for business intelligence and machine learning applications. There are a few important aspects of problem solving that data engineers heavily rely on once we collect the data. We first clean the data, so we have a clear picture of the business outcomes we would like to achieve. Data engineers need to identify errors and inconsistencies, then remove them. This can be a challenging task as data can be dirty for a variety of reasons, such as human error, machine error, or a lot of formatting issues. Next, we integrate data from a variety of sources. This can be complex as different sources may use…