This document discusses the challenges and methodologies for extracting data records from web pages, focusing on automatic web information extraction (WIE). It outlines a systematic approach to filter and cluster relevant information from semi-structured web pages, highlighting the complexity due to the heterogeneous nature of web content. The proposed algorithm aims to improve the efficiency of data integration and retrieval by emphasizing visual characteristics and layout features in web design.