Splitting a dataset into a training and test set
In this recipe, you will split the data into training and test sets using the SSIS percentage sampling transformation. You will use 70 percent of the data for the training set and 3 percent for the test set.
Getting ready
There are no special prerequisites for this recipe, except, of course, SSIS 2016 installed, and the AdventureWorksDW2014 database available in your SQL Server instance.
How to do it...
Open SQL Server Data Tools (SSDT) and create a new project using the integration services project template. Place the solution in the C:\SSIS2016Cookbook folder and name the project Chapter08:
- Rename the default package to
SplitData.dtsx. - In the
Control Flowtab in thePackage Designer, add a new data flow task by dragging and dropping it from the SSIS toolbox. - Right-click the task and select
Renamefrom the pop-up menu. Change the task's name toSplitData. - Click the
Data Flowtab. - Create a new OLE DB source. Name it
AW_DW_Source. - Double-click the
AW_DW_Source...