Hội thảo quốc tế (ISBN/ISSN)

Choosing data points to label for semi-supervised learning based on neighborhood

Tạp chí / Hội thảo: Lecture Notes in Networks and Systems Đơn vị: NT&TT DOI / Link:

Tác giả

Dinh-Minh Vu ; Thanh-Son Nguyen ; Trung-Nghia Phung ; Trinh-Hoang Vu

Tác giả liên hệ

Tóm tắt

Semi-supervised fuzzy clustering is an extension of fuzzy clustering, it uses additional information to guide and monitor the clustering process. The additional information can be seed form (labeled data), constraint form (must-link/cannot-link), or predefined membership form. In fact, with based clustering, the clustering result depends on the selected data point, so with different selected data points will produce different clustering results. In some cases, the performance of clustering can be reduced if the data points are not selected properly. This is a really problem with semi-supervised clustering algorithms. In addition, the selection of the right data points helps to reduce the number of queries from experts. For this purpose, we propose a new learning algorithm (called WHFMN) for the seed's collection activity, this activity identifies candidates for additional information using fuzzy min max neural network. Four of the most interesting characteristics of WHFMN are that (1) it automatically calculates data points to add additional information, (2) it automatically adds new clusters without retraining, (3) it automatically finds the noise objects according to the density of the natural clusters and (4) it can find natural cluster structure even if density and shape are different between clusters. Experimental results prove that the proposed algorithm can improve the efficiency of the label selection method more than the previously published methods.

Từ khoá

Hyperboxe Fuzzy min max Seed form Data point Neighborhood

Tài liệu tham khảo

A. Allahyar, H. S. Yazdi, và A. Harati, “Constrained semi-supervised growing self-organizing map,” Neurocomputing, vol. 147, pp. 456–471, Jan. 2015, doi: 10.1016/j.neucom.2014.06.039.
B. Gabrys và A. Bargiela, “General fuzzy min-max neural network for clustering and classification,” IEEE Trans. Neural Netw., vol. 11, no. 3, pp. 769–783, May 2000, doi: 10.1109/72.846747.
K. Wagstaff, C. Cardie, S. Rogers, và S. Schrödl, “Constrained k-means clustering with background knowledge,” in Proc. 18th Int. Conf. Mach. Learn. (ICML), Williams College, MA, USA, 2001, pp. 577–584.
T. T. Khuat và B. Gabrys, “Accelerated learning algorithms of general fuzzy min-max neural network using a novel hyperbox selection rule,” Inf. Sci., vol. 547, pp. 887–909, Feb. 2021, doi: 10.1016/j.ins.2020.08.046.
T. T. Khuat và B. Gabrys, “A comparative study of general fuzzy min-max neural networks for pattern classification problems,” Neurocomputing, vol. 386, pp. 110–125, Apr. 2020, doi: 10.1016/j.neucom.2019.12.090.
L. Lelis và J. Sander, “Semi-supervised density-based clustering,” in Proc. 9th IEEE Int. Conf. Data Mining (ICDM), Miami, FL, USA, 2009, pp. 842–847.
C. Le, V. V. Vu, và N. T. H. Yen, “Choosing seeds for semi-supervised graph based clustering,” J. Comput. Sci. Cybern., vol. 35, no. 4, pp. 373–384, 2019, doi: 10.15625/1813-9663/35/4/14123.
R. Yan, J. Zhang, J. Yang, và A. Hauptmann, “A discriminative learning framework with pairwise constraints for video object classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 578–593, Apr. 2006, doi: 10.1109/TPAMI.2006.76.
P. K. Simpson, “Fuzzy min-max neural networks—Part II: Clustering,” IEEE Trans. Fuzzy Syst., vol. 1, no. 1, pp. 32–45, Feb. 1993, doi: 10.1109/TFUZZ.1993.390282.
S. Basu, A. Banerjee, và R. Mooney, “Semi-supervised clustering by seeding,” in Proc. 19th Int. Conf. Mach. Learn. (ICML), Sydney, Australia, 2002, pp. 27–34.
T. N. Tran, D. M. Vu, M. T. Tran, và B. D. Le, “The combination of fuzzy min-max neural network and semi-supervised learning in solving liver disease diagnosis support problem,” Arab. J. Sci. Eng., vol. 44, no. 4, pp. 2933–2944, Apr. 2019, doi: 10.1007/s13369-018-3351-7.
V.-V. Vu, “An efficient semi-supervised graph based clustering,” Intell. Data Anal., vol. 22, no. 2, pp. 297–317, 2018, doi: 10.3233/IDA-163296.
D. M. Vu, V. H. Nguyen, và B. D. Le, “Semi-supervised clustering in fuzzy min-max neural network,” in Proc. Int. Conf. Adv. Inf. Commun. Technol. (ICTA), Hue City, Vietnam, 2016, pp. 541–550. Springer International Publishing.
V. D. Minh et al., “An improvement in integrating clustering method and neural network to extract rules and application in diagnosis support,” Iran. J. Fuzzy Syst., 2022.
SIPU datasets. [Online]. Available: https://cs.joensuu.fi/sipu/datasets
UCI Machine Learning Repository. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html