The document describes the process of developing a search engine classifier to determine the relevance of URLs. It explores various machine learning models including decision trees, naive Bayes, random forests, and support vector machines (SVM). Through tuning the hyperparameters, SVM achieved the highest accuracy of 68.8% at predicting relevance. Key steps included feature selection, cross-validation, and modifying parameters like kernels, costs, and number of trees/variables to optimize model performance. SVM with a radial kernel and cost of 263 produced the most accurate results.
Related topics: