This paper presents a deep image clustering model called DCSM, which addresses the limitations of existing methods by incorporating both class-level and instance-level representations through mutual information maximization. The model utilizes a backbone network to learn discriminative features and reduce intra-class diversity by maximizing mutual information across different views of the same image. Experimental results demonstrate that DCSM outperforms current state-of-the-art clustering models.