Feature Scaling Using R

Last Updated : 26 Jun, 2025

Feature scaling is a technique to improve the accuracy of machine learning models. This can be done by removing unreliable data points from the training set so that the model can learn useful information about relevant features. Feature scaling is widely used in many fields, including business analytics and clinical data science.

Feature Scaling Using R Programming Language

Feature scaling in R refers to the process of standardizing or normalizing the range of independent variables or features in a dataset. It ensures that each feature contributes equally to the model, preventing features with larger values from disproportionately influencing the results.

There are mainly two types of feature scaling techniques.

1. Standardization

Standardization is the simplest form of scaling, in which all the values are standardized to have a mean of zero and a standard deviation of one. For example, if you had a dataset with two variables (age and height), then you would calculate their means and standard deviations before performing any statistical tests on them.

2. Normalization

Normalization involves calculating the mean and median of a dataset and assessing their difference. If they significantly differ, it suggests something unusual about the data, preventing misleading conclusions, such as assuming a sample is representative of the population without further analysis. (e.g., "My kid might be taller than average because he grew faster than most kids his age").

Creating a Dataset to apply feature scaling in R

First, we need to create a dataframe to apply feature scaling techniques on the dataframe. We will explore different methods and libraries to do so.

age <- c(19,20,21,22,23,24,24,26,27)

salary <- c(10000,20000,30000,40000,
            50000,60000,70000,80000,90000)

df <- data.frame( "Age" = age,
                 "Salary" = salary,
                 stringsAsFactors = FALSE)
df

Output:

Once the dataset is created we can start implementing Feature Scaling.

By using General Formula

We know the formulas for both standardization and normalization. Let's apply them one by one.

1. For Standardization

We are manually standardizing the dataset df using z-score normalization. Each column is transformed with the formula (x - \text{mean}) / \text{sd} and the result is saved as a new data frame scaled_data.

scaled_data <- as.data.frame(sapply(df, function(x)
                      (x-mean(x))/sd(x)))
scaled_data

Output:

2. For Normalization

We are manually normalizing the dataset df to a 0–1 range using the min-max formula. Each column is transformed using the expression (x - \text{min}) / (\text{max} - \text{min}) and the result is stored as a new data frame scaled_data2.

scaled_data2 <- as.data.frame(sapply(df, function(x) 
                      (x-min(x))/(max(x)-min(x))))
scaled_data2

Output:

Using Caret Library

Let's import the library caret and then apply the Standardization and Normalisation.

1. Standardization Using Caret Library

We are standardizing the dataset df by centering and scaling its numeric features using the caret package. First, we create a preprocessing model with preProcess(), then apply it to the data using predict().

install.packages("caret")
library(caret)

data1.pre <- preProcess(df, method=c("center", "scale"))
data1<- predict(data1.pre, df)
data1

Output:

2. Normalisation Using Caret Library

We are normalizing the dataset df to a 0–1 range using the caret package. First, we create a preprocessing model with preProcess(method = "range"), then apply it to the data using predict().

library("caret")

data2.pre <- preProcess(df, method="range")
data2 <- predict(data2.pre, df)
data2

Output:

Using Dplyr Library

Let's import the library dplyr and then apply the Standardization and Normalisation.

1. Standardization Using Dplyr Library

We are standardizing the "Salary" column in the dataset df using the scale() function. After loading the dplyr package, we use mutate_at() to apply z-score normalization to the "Salary" column and store the result in data_s.

install.packages("dplyr")
library(dplyr)

data_s <- df %>%
mutate_at(vars("Salary"), scale)
data_s

Output:

2. Normalisation Using Dplyr Library

We are standardizing all columns in the dataset df using the scale() function. With the dplyr package, we use mutate_all() to apply z-score normalization to every column and save the result in data_n.

library(dplyr)

data_n <- df %>%
mutate_all(scale)
data_n

Output:

Using BBmisc package

BBmisc is an R package so with the help of it we can calculate the standardization and normalization.

1. Standardization Using BBmisc package

We are standardizing the entire dataset df using the BBmisc package. The normalize() function with method = "standardize" applies z-score normalization to all numeric columns and stores the result in df_standardized.

install.packages("BBmisc")
library(BBmisc)

df_standardized <- BBmisc::normalize(df, method = "standardize")

df_standardized

Output:

2. Normalization Using BBmisc package

We are normalizing the dataset df to a 0–1 range using the BBmisc package. The normalize() function with method = "range" scales all numeric columns and stores the result in df_normalized.

library(BBmisc)

df_normalized <- BBmisc::normalize(df, method = "range")

df_normalized

Output:

In this article, we explored various methods for performing feature scaling in R using different libraries and techniques.

SVM Feature Selection in R with Example

rohithk30

Improve

Article Tags :

Feature Scaling Using R