Data preprocessing involves cleaning data by handling missing values and outliers, integrating multiple data sources, and transforming data through normalization, aggregation, and dimension reduction. The goals are to improve data quality, handle inconsistencies, and reduce data volume for analysis. Major tasks include data cleaning, integration, transformation, reduction through methods like feature selection, clustering, sampling and discretization of continuous variables. Preprocessing comprises the majority of work in data mining projects.