Pool Based Active Learning Implementation

Tech Stack: Python, Kaggle Notebook

Naive Bayes and Random Forest were trained using 33,480 data samples. The Naive Bayes model achieved an accuracy of 93.6%, while the Random Forest model achieved an accuracy of 93.9%. Active learning techniques were applied to both models. The Naive Bayes model with active learning achieved an accuracy of 94.1% only using 3,600 data samples and reached an accuracy of 99.1% using 10,600 data samples. Similarly, the Random Forest model with active learning achieved an accuracy of 94.5% only using 5,350 data samples and reached an accuracy of 98.9% using 10,600 data samples.

From the given information, we can conclude that active learning allowed the models to achieve higher accuracies using a smaller number of data samples. For example, the Naive Bayes model reached an accuracy of 99.1% using 10,600 data samples, whereas the Random Forest model reached an accuracy of 98.9% using the same number of data samples. This suggests that active learning helped in selecting informative samples for labeling, leading to improved model performance with fewer labeled instances.

Overall, these findings indicate that active learning can be an effective approach to enhance the performance of machine learning models by leveraging a smaller labeled dataset while achieving high accuracy.

Code: https://www.kaggle.com/code/zemosi/pool-based-active-learning-implementation