Data preprocessing involves cleaning data by handling missing values, outliers, and noise. It also includes data integration and transformation through normalization, aggregation, and dimensionality reduction. The goals are to improve data quality, handle inconsistencies, and reduce data size for mining. Techniques include binning, clustering, sampling and discretization which create intervals or concept hierarchies to generalize continuous attributes for analysis.