×

Large margin cost-sensitive learning of conditional random fields. (English) Zbl 1209.68464

Summary: We tackle the structured output classification problem using the Conditional Random Fields (CRFs). Unlike the standard 0/1 loss case, we consider a cost-sensitive learning setting where we are given a non-0/1 misclassification cost matrix at the individual output level. Although the task of cost-sensitive classification has many interesting practical applications that retain domain-specific scales in the output space (e.g., hierarchical or ordinal scale), most CRF learning algorithms are unable to effectively deal with the cost-sensitive scenarios as they merely assume a nominal scale (hence 0/1 loss) in the output space. In this paper, we incorporate the cost-sensitive loss into the large margin learning framework. By large margin learning, the proposed algorithm inherits most benefits from the SVM-like margin-based classifiers, such as the provable generalization error bounds. Moreover, the soft-max approximation employed in our approach yields a convex optimization similar to the standard CRF learning with only slight modification in the potential functions. We also provide the theoretical cost-sensitive generalization error bound. We demonstrate the improved prediction performance of the proposed method over the existing approaches in a diverse set of sequence/image structured prediction problems that often arise in pattern recognition and computer vision domains.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence

Software:

CamVid
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in: International Conference on Machine Learning, Williamstown, MA, USA, 2001.; J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in: International Conference on Machine Learning, Williamstown, MA, USA, 2001.
[2] F. Sha, F. Pereira, Shallow parsing with conditional random fields, in: Proc. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 134-141.; F. Sha, F. Pereira, Shallow parsing with conditional random fields, in: Proc. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 134-141.
[3] Kumar, S.; Hebert, M., Discriminative random fields, International Journal of Computer Vision, 68, 179-201 (2006)
[4] McDonald, R.; Pereira, F., Identifying gene and protein mentions in text using conditional random fields, BMC Bioinformatics, 6, Suppl. 1 (2005)
[5] X. He, R. Zemel, M.A. Carreira-Perpinan, Multiscale conditional random fields for image labelling, in: IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 2004.; X. He, R. Zemel, M.A. Carreira-Perpinan, Multiscale conditional random fields for image labelling, in: IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 2004.
[6] Zhang, J.; Gong, S., Action categorization with modified hidden conditional random field, Pattern Recognition, 43, 1, 197-203 (2010) · Zbl 1176.68191
[7] M. Kim, V. Pavlovic, Conditional state space models for discriminative motion estimation, in: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007.; M. Kim, V. Pavlovic, Conditional state space models for discriminative motion estimation, in: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007.
[8] S. Vishwanathan, N. Schraudolph, M. Schmidt, K. Murphy, Accelerated training of conditional random fields with stochastic meta-descent, in: International Conference on Machine Learning, Pittsburgh, PA, USA, 2006.; S. Vishwanathan, N. Schraudolph, M. Schmidt, K. Murphy, Accelerated training of conditional random fields with stochastic meta-descent, in: International Conference on Machine Learning, Pittsburgh, PA, USA, 2006.
[9] B. Taskar, C. Guestrin, D. Koller, Max-margin Markov networks, in: Neural Information Processing Systems, Vancouver, BC, Canada, 2003.; B. Taskar, C. Guestrin, D. Koller, Max-margin Markov networks, in: Neural Information Processing Systems, Vancouver, BC, Canada, 2003.
[10] Y. Altun, I. Tsochantaridis, T, Hofmann, Hidden Markov support vector machines, in: International Conference on Machine Learning, Washington, DC, USA, 2003.; Y. Altun, I. Tsochantaridis, T, Hofmann, Hidden Markov support vector machines, in: International Conference on Machine Learning, Washington, DC, USA, 2003.
[11] C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of International Joint Conference on Artificial Intelligence, Seattle, Washington, USA, 2001, pp. 973-978.; C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of International Joint Conference on Artificial Intelligence, Seattle, Washington, USA, 2001, pp. 973-978.
[12] Domingos, P., MetaCost: a general method for making classifiers cost-sensitive, (Proceedings of International Conference on Knowledge Discovery and Data Mining (1999), ACM Press), 155-164
[13] B. Zadrozny, J. Langford, N. Abe, Cost-sensitive learning by cost-proportionate example weighting, in: IEEE International Conference on Data Mining, Melbourne, Florida, USA, 2003, pp. 435-442.; B. Zadrozny, J. Langford, N. Abe, Cost-sensitive learning by cost-proportionate example weighting, in: IEEE International Conference on Data Mining, Melbourne, Florida, USA, 2003, pp. 435-442.
[14] Sun, Y.; Kamel, M. S.; Wong, A. K.; Wang, Y., Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40, 12, 3358-3378 (2007) · Zbl 1122.68505
[15] Xia, F.; wu Yang, Y.; Zhou, L.; Li, F.; Cai, M.; Zeng, D. D., A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning, Pattern Recognition, 42, 7, 1572-1581 (2009) · Zbl 1183.68512
[16] Sen, P.; Getoor, L., Cost-sensitive learning with conditional Markov networks, Data Mining and Knowledge Discovery, 17, 2, 136-163 (2008)
[17] Bartlett, P. L., The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network, IEEE Transactions on Information Theory, 44, 2, 525-536 (1998) · Zbl 0901.68177
[18] J. Shawe-Taylor, P. Bartlett, R. Williamson, M. Anthony, A framework for structural risk minimisation, in: Proceedings of the Ninth Annual Conference on Computational Learning Theory, Desenzano sul Garda, Italy, 1996.; J. Shawe-Taylor, P. Bartlett, R. Williamson, M. Anthony, A framework for structural risk minimisation, in: Proceedings of the Ninth Annual Conference on Computational Learning Theory, Desenzano sul Garda, Italy, 1996.
[19] Vapnik, V. N., The Nature of Statistical Learning Theory (1995), Springer · Zbl 0934.62009
[20] Zhang, T., Covering number bounds of certain regularized linear function classes, Journal of Machine Learning Research, 2, 527-550 (2002) · Zbl 1007.68157
[21] Jordan, M. I., Graphical models, Statistical Science, 19, 140-155 (2004) · Zbl 1057.62001
[22] J. Yedidia, W. Freeman, Y. Weiss, Understanding belief propagation and its generalizations, in: Exploring Artificial Intelligence in the New Millennium, Science & Technology Books, 2003, pp. 239-236 (Chapter 8).; J. Yedidia, W. Freeman, Y. Weiss, Understanding belief propagation and its generalizations, in: Exploring Artificial Intelligence in the New Millennium, Science & Technology Books, 2003, pp. 239-236 (Chapter 8).
[23] Crammer, K.; Singer, Y., On the algorithmic implementation of multiclass kernel based vector machines, Journal of Machine Learning Research, 2, 5, 265-292 (2001) · Zbl 1037.68110
[24] F. Sha, L.K. Saul, Large margin hidden Markov models for automatic speech recognition, Neural Information Processing Systems, Vancouver, BC, Canada, 2007.; F. Sha, L.K. Saul, Large margin hidden Markov models for automatic speech recognition, Neural Information Processing Systems, Vancouver, BC, Canada, 2007.
[25] T.-P. Tian, R. Li, S. Sclaroff, Articulated pose estimation in a learned smooth space of feasible solutions, in: Proceedings of IEEE Workshop in Computer Vision and Pattern Recognition, 2005.; T.-P. Tian, R. Li, S. Sclaroff, Articulated pose estimation in a learned smooth space of feasible solutions, in: Proceedings of IEEE Workshop in Computer Vision and Pattern Recognition, 2005.
[26] W. Chu, S.S. Keerthi, New approaches to support vector ordinal regression, International Conference on Machine Learning, Bonn, Germany, 2005.; W. Chu, S.S. Keerthi, New approaches to support vector ordinal regression, International Conference on Machine Learning, Bonn, Germany, 2005. · Zbl 1127.68080
[27] R.A. Locarnini, A.V. Mishonov, J.I. Antonov, T.P. Boyer, H.E. Garcia, World ocean atlas 2005, in: S. Levitus (Ed.), NOAA Atlas NESDIS, US Government Printing Office, Washington, DC, pp. 61-64.; R.A. Locarnini, A.V. Mishonov, J.I. Antonov, T.P. Boyer, H.E. Garcia, World ocean atlas 2005, in: S. Levitus (Ed.), NOAA Atlas NESDIS, US Government Printing Office, Washington, DC, pp. 61-64.
[28] Brostow, G. J.; Fauqueur, J.; Cipolla, R., Semantic object classes in video: a high-definition ground truth database, Pattern Recognition Letters, 30, 2, 88-97 (2008)
[29] G.J. Brostow, J. Shotton, J. Fauqueur, R. Cipolla, Segmentation and recognition using structure from motion point clouds, in: European Conference on Computer Vision, 2008, pp. 44-57.; G.J. Brostow, J. Shotton, J. Fauqueur, R. Cipolla, Segmentation and recognition using structure from motion point clouds, in: European Conference on Computer Vision, 2008, pp. 44-57.
[30] S. Kumar, M. Hebert, Man-made structure detection in natural images using a causal multiscale random field, IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, 2003.; S. Kumar, M. Hebert, Man-made structure detection in natural images using a causal multiscale random field, IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, 2003.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.