×

A self-training approach to cost sensitive uncertainty sampling. (English) Zbl 1470.68136

Summary: Uncertainty sampling is an effective method for performing active learning that is computationally efficient compared to other active learning methods such as loss-reduction methods. However, unlike loss-reduction methods, uncertainty sampling cannot minimize total misclassification costs when errors incur different costs. This paper introduces a method for performing cost-sensitive uncertainty sampling that makes use of self-training. We show that, even when misclassification costs are equal, this self-training approach results in faster reduction of loss as a function of number of points labeled and more reliable posterior probability estimates as compared to standard uncertainty sampling. We also show why other more naive methods of modifying uncertainty sampling to minimize total misclassification costs will not always work well.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

CLUTO; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ando, R., & Zhang, T. (2005). A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 1–9). NJ: Association for Computational Linguistics Morristown.
[2] Asuncion, A., & Newman, D. (2007). UCI machine learning repository.
[3] Beygelzimer, A., Dasgupta, S., & Langford, J. (2009). Importance weighted active learning. In ICML ’09: Proceedings of the 26th international conference on Machine learning. · Zbl 1162.68516
[4] Chen, Y., Crawford, M., & Ghosh, J. (2007). Knowledge based stacking of hyperspectral data for land cover classification. In Computational intelligence and data mining, 2007. CIDM 2007. IEEE Symposium on (pp. 316–322).
[5] Dasgupta, S., & Hsu, D. (2008). Hierarchical sampling for active learning. In ICML ’08: Proceedings of the 25th international conference on Machine learning (pp. 208–215). Helsinki, Finland.
[6] Dasgupta, S., Hsu, D., & Monteleoni, C. (2008). A general agnostic active learning algorithm. Advances in Neural Information Processing Systems, 20, 353–360.
[7] Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the seventeenth international joint conference on artificial intelligence (pp. 973–978).
[8] Ham, J., Chen, Y., Crawford, M. M., & Ghosh, J. (2005). Investigation of the random forest framework for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(3), 492–501. · doi:10.1109/TGRS.2004.842481
[9] Karypis, G. (2002). CLUTO–a clustering toolkit. University of Minnesota technical report 02-017.
[10] Kumar, S., Ghosh, J., & Crawford, M. M. (2001). Best-bases feature extraction algorithms for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 39(7), 1368–1379. · doi:10.1109/36.934070
[11] Landgrebe, D. (2002). Hyperspectral image data analysis. Signal Processing Magazine, IEEE, 19, 17–28. · doi:10.1109/79.974718
[12] Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning (pp. 148–156). San Mateo: Morgan Kaufmann.
[13] Margineantu, D. D. (2005). Active cost-sensitive learning. In The nineteenth international joint conference on artificial intelligence. Edinburgh, Scotland.
[14] McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. In AAAI-98 workshop on learning for text categorization.
[15] McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. In Proceedings of HLT-NAACL 2006.
[16] Morgan, J. T. (2002). Adaptive hierarchical classifier with limited training data. PhD thesis, Univ. of Texas at Austin.
[17] Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-supervised self-training of object detection models. In Seventh IEEE workshop on applications of computer vision (vol. 1, pp. 29–36).
[18] Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th international conference on machine learning (pp. 441–448). San Mateo: Morgan Kaufmann.
[19] Saar-Tsechansky, M., & Provost, F. (2004). Active sampling for class probability estimation and ranking. Machine Learning, 54(2), 153–178. · Zbl 1057.68089 · doi:10.1023/B:MACH.0000011806.12374.c3
[20] Saar-Tsechansky, M., & Provost, F. (2007). Decision-centric active learning of binary-outcome models. Information Systems Research, 18(1), 1–19. · Zbl 1222.68295 · doi:10.1287/isre.1070.0111
[21] Settles, B. (2009). Active learning literature survey. Computer sciences technical report 1648, University of Wisconsin–Madison.
[22] Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the fifth annual workshop on computational learning theory (pp. 287–294). Pittsburgh: ACM.
[23] Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on association for computational linguistics (pp. 189–196).
[24] Zhong, S., & Ghosh, J. (2003). A comparative study of generative models for document clustering. In SDM workshop on clustering high dimensional data and its applications.
[25] Zhu, X. (2005). Semi-supervised learning literature survey. Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.