WebSmatch is a platform for integrating open data from heterogeneous sources (1). It addresses problems with large numbers of data sources in different formats, including many Excel files that are poorly structured (2). WebSmatch crawls, classifies, documents and references data sources, then extracts and structures the data for visualization through APIs (3). It uses machine learning and concept matching to extract metadata from Excel files, including detecting tables, attributes, and concepts (4,5,6,7,10,11). The results are exported in structured formats like DSPL for third party use and visualization (13,14,16). Future work includes automating extraction at scale, clustering documents, and integrating with other tools (