Sinmin is a corpus for the Sinhala language that is continuously updating and scalable. It covers a wide range of structured and unstructured Sinhala language data from sources like news, academic writings, fiction, and more. The architecture includes crawlers to fetch web pages, data cleaning mechanisms to handle issues like erroneous characters and short forms, and Cassandra as the main storage system. The API and user interface allow users to perform queries on the corpus to obtain word and ngram frequencies, latest articles, and more. Performance testing was conducted on the storage systems and API.