Abstract
Data streaming applications such as the Internet of Things (IoT) require processing or predicting from sequential data from various sensors. However, most of the data are unlabeled, making applying fully supervised learning algorithms impossible. The online manifold regularization approach allows sequential learning from partially labeled data, which is useful for sequential learning in environments with scarcely labeled data. Unfortunately, the manifold regularization technique does not work out of the box as it requires determining the radial basis function (RBF) kernel width parameter. The RBF kernel width parameter directly impacts the performance as it is used to inform the model to which class each piece of data most likely belongs. The width parameter is often determined off-line via hyperparameter search, where a vast amount of labeled data is required. Therefore, it limits its utility in applications where it is difficult to collect a great deal of labeled data, such as data stream mining. To address this issue, we proposed eliminating the RBF kernel from the manifold regularization technique altogether by combining the manifold regularization technique with a prototype learning method, which uses a finite set of prototypes to approximate the entire data set. Compared to other manifold regularization approaches, this approach instead queries the prototype-based learner to find the most similar samples for each sample instead of relying on the RBF kernel. Thus, it no longer necessitates the RBF kernel, which improves its practicality. The proposed approach can learn faster and achieve a higher classification performance than other manifold regularization techniques based on experiments on benchmark data sets. Results showed that the proposed approach can perform well even without using the RBF kernel, which improves the practicality of manifold regularization techniques for semi-supervised learning.
Problem Background
Applying the manifold regularization method in real-world environments is still lacking practicality. This is due to the difficulty in determining the width parameter λ of the radial basis function (RBF) kernel.
Methods
Propose combining a prototype-based learning algorithm, specifically the enhanced self-organizing incremental neural network (ESOINN) [19] with the semi-supervised online sequential-extreme learning machine (SOS-ELM) [20] manifold regularization approach, which this proposed approach will be built upon.
Results
Through experiments, this proposed approach can learn faster by using the prototypes to construct the Laplacian graph. This shows that the manifold assumption, which states that high-dimensional data are locally Euclidean, is verified. Moreover, this approach also shows significant performance improvement on an imbalanced data set. However, this comes at a slight increase in computational complexity to train the modified ESOINN.
Conclusion
According to the hypothesis test, this approach significantly outranks some current manifold regularization approaches. However, this comes at the cost of higher computational consumption to train the ESOINN. In future work, we suggest designing an approach that could determine all the hyperparameters rather than relying on the default hyperparameters. Ideally, this approach should be able to determine all the hyperparameters online to suit data streaming applications.