×

Multilabel classification with meta-level features in a learning-to-rank framework. (English) Zbl 1243.68257

Summary: Effective learning in multi-label classification (MLC) requires an appropriate level of abstraction for representing the relationship between each instance and multiple categories. Current MLC methods have focused on learning-to-map from instances to categories in a relatively low-level feature space, such as individual words. The fine-grained features in such a space may not be sufficiently expressive for learning to rank categories, which is essential in multi-label classification. This paper presents an alternative solution by transforming the conventional representation of instances and categories into meta-level features, and by leveraging successful learning-to-rank retrieval algorithms over this feature space. Controlled experiments on six benchmark datasets using eight evaluation metrics show strong evidence for the effectiveness of the proposed approach, which significantly outperformed other state-of-the-art methods such as Rank-SVM, ML-\(k\)NN (multi-label \(k\)NN), IBLR-ML (instance-based logistic regression for multi-label classification) on most of the datasets. Thorough analyses are also provided for separating the factors responsible for the improved performance.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM , 45(6), 891-923. · Zbl 1065.68650 · doi:10.1145/293347.293348
[2] Boutell, M., Luo, J., Shen, X., & Brown, C. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771. · doi:10.1016/j.patcog.2004.03.009
[3] Burges, C.; Shaked, T.; Renshaw, E.; Lazier, A.; Deeds, M.; Hamilton, N.; Hullender, G., Learning to rank using gradient descent, 96 (2005), New York
[4] Burges, C., Ragno, R., & Le, Q. (2007). Learning to rank with nonsmooth cost functions. Advances in Neural Information Processing Systems, 19, 193.
[5] Cao, Z.; Qin, T.; Liu, T.; Tsai, M.; Li, H., Learning to rank: from pairwise approach to listwise approach, 136 (2007), New York
[6] Cheng, W., & Hüllermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2-3), 211-225. doi:10.1007/s10994-009-5127-5, http://www.springerlink.com/content/m20342966250233x/. · Zbl 1470.68091 · doi:10.1007/s10994-009-5127-5
[7] Creecy, R., Masand, B., Smith, S., & Waltz, D. (1992). Trading MIPS and memory for knowledge engineering. Communications of the ACM, 35(8), 48-64. · doi:10.1145/135226.135228
[8] Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 30. · Zbl 1222.68184
[9] Donmez, P.; Svore, K.; Burges, C., On the local optimality of LambdaRank, 460-467 (2009), New York
[10] Elisseeff, A.; Weston, J., Kernel methods for multi-labelled classification and categorical regression problems, No. 14, 681-687 (2001), Cambridge
[11] Freund, Y., Iyer, R., Schapire, R., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933-969. · Zbl 1098.68652
[12] Ganapathiraju, A.; Hamaker, J.; Picone, J., Support vector machines for speech recognition, 2923-2926 (1998), New York
[13] Garcıa, S., & Herrera, F. (2008). An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research, 9, 2677-2694. · Zbl 1225.68178
[14] Gopal, S.; Yang, Y., Multilabel classification with meta-level features, 315-322 (2010), New York
[15] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[16] Järvelin, K.; Kekäläinen, J., IR evaluation methods for retrieving highly relevant documents, 41-48 (2000), New York · doi:10.1145/345508.345545
[17] Joachims, T., Making large-scale support vector machine learning practical (1999)
[18] Joachims, T., Optimizing search engines using clickthrough data, 133-142 (2002), New York · doi:10.1145/775047.775067
[19] Kleinberg, J., Two algorithms for nearest-neighbor search in high dimensions, 599-608 (1997), New York · Zbl 0963.68049 · doi:10.1145/258533.258653
[20] Lewis, D.; Schapire, R.; Callan, J.; Papka, R., Training algorithms for linear text classifiers, 298-306 (1996), New York · doi:10.1145/243199.243277
[21] Li, P., Burges, C., Wu, Q., Platt, J., Koller, D., Singer, Y., & Roweis, S. (2007) McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.
[22] Qin, T., Liu, T., Xu, J., & Li, H. (2010) LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 1-29.
[23] Roussopoulos, N.; Kelley, S.; Vincent, F., Nearest neighbor queries, No. 24, 71-79 (1995), New York
[24] Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297-336. · Zbl 0945.68194 · doi:10.1023/A:1007614523901
[25] Schapire, R., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2), 135-168. · Zbl 0951.68561 · doi:10.1023/A:1007649029923
[26] Trohidis, K.; Tsoumakas, G.; Kalliris, G.; Vlahavas, I., Multilabel classification of music into emotions, Philadelphia, PA, USA
[27] Tsai, M.; Liu, T.; Qin, T.; Chen, H.; Ma, W., FRank: A ranking method with fidelity loss, 390 (2007), New York
[28] Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2), 1453. · Zbl 1222.68321
[29] Tsoumakas, G., Vilcek, J., Spyromitros, E., & Vlahavas, I. (2010). Mulan: a Java library for multilabel learning. Journal of Machine Learning Research, 1, 1-48.
[30] Vapnik, V. (2000). The nature of statistical learning theory. Berlin: Springer. · Zbl 0934.62009
[31] Voorhees, E. (2003) Overview of TREC 2002. NIST special publication SP (pp. 1-16).
[32] Xu, J.; Li, H., Adarank: A boosting algorithm for information retrieval, 398 (2007), New York
[33] Yang, Y., Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, 13-22 (1994), New York
[34] Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1), 69-90. · doi:10.1023/A:1009982220290
[35] Yang, Y., A study of thresholding strategies for text categorization, 137-145 (2001), New York · doi:10.1145/383952.383975
[36] Yang, Y.; Liu, X., A re-examination of text categorization methods, 42-49 (1999), New York · doi:10.1145/312624.312647
[37] Yang, Y., & Pedersen, J. (1997) A comparative study on feature selection in text categorization. In International conference in machine learning (pp. 412-420). Citeseer.
[38] Yianilos, P., Data structures and algorithms for nearest neighbor search in general metric spaces, 311-321 (1993), Philadelphia · Zbl 0801.68037
[39] Yue, Y.; Finley, T., A support vector method for optimizing average precision, 271-278 (2007), New York
[40] Zhang, M., & Zhou, Z. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038-2048. · Zbl 1111.68629 · doi:10.1016/j.patcog.2006.12.019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.