Abstract
For long-term online prediction of nonlinear time series, how to determine a feasible network architecture that conforms to the time-varying data stream is recognized to be a challenging problem. To deal with the issue, a dynamic ELM with balanced variance and bias has been proposed. A suitable fitting degree, which contains model applicability at sequential learning phase, is taken into consideration. Based on the shifting error of the sequence fragment, the automatic model update strategy is exploited. Transformable parameters help reduce the overfitting and underfitting at the same time, and avoid the trial and error caused by user intervention effectively, so as to guarantee the feasibility for long-term online prediction. Furthermore, hidden node number and the regularization parameter can be calculated according to the fast-changing test data, thus building an optimum network architecture quantificationally. Experimental results verify that the proposed algorithm has better generalization performance on various long-term regression problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Long-term online prediction is an essential requirement in practical applications, such as power generation [1], traffic flow [2], air quality sensing [3], and temperature prediction [4]. For complex system modeling, large numbers of parameters need to be confirmed and the interactive learning process usually costs intolerable computational overhead. In long-term online learning process, the neural network cannot always be rebuilt through trial and error caused by user intervention. Therefore, how to adjust the model and parameters depends on the arrived data has been a hot research [5].
Aiming at regression problem, the key target is fitting. Absolutely the best training model for all datasets does not exist, and fitting deviation cannot be avoided [6]. Compared to underfitting, the overfitting is a hard nut to crack, because we must evaluate the overfitting degree at the test stage [7]. Punishment constraint is added in the candidate model so as to prevent the overfitting problem [8]. Due to the asymptotic convergence property, AIC is frequently used for the small sample [9]. In online scenario, dataset is arriving one by one or chunk by chunk. The regularity of input characteristics may be changed as time goes on, and the gradient descent algorithm will influence the predictive efficiency. As a single-hidden-layer feedforward network (SLFN), Extreme Learning Machine (ELM) needs no iterations, which is widely used on the classification and regression problem [10]. The dynamic neural network architecture extends the original ELM learning strategy to make it adequate for the online sequence (OS-ELM) [11]. Some novel ELMs raise hidden node number with arriving data. They have not only low computational overhead, but also remarkable capabilities for nonlinear mappings [12]. To avoid enlarging network scale without restriction, the incremental ELM (I-ELM) set a termination condition using the expected learning accuracy and maximum hidden node number [13]. Then the convex I-ELM (CI-ELM) [14] achieves faster convergence rate on account of adjusting the output weight, and the error minimized ELM (EM-ELM) [15] moves toward the predictive target with varying length hidden nodes. In order to make the neural networks more compact, the adaptive ELM [16] and the sparse ELM [17] has been proposed. These pruning methods all generate succinct links between different layer nodes. While they are inevitable to bear the high computational complexity due to the estimation of probability density [18].
So far, the most representative dynamic ELM for long-term online prediction is the D-ELM [19], it combines the EM-ELM and AG-ELM [20]. The method has a huge advantage on data fitting when the target function is not explicit. To optimize multiple parameters, some researchers bring in biological intelligent optimization method to acquire the best topologies of neural network [21,22,23]. But the batch learning processes are not suitable for online data stream, especially in the long-term prediction. Han et al. [24] presents a dynamic ELM based on particle swarm optimization(PSO) [25] to obtain the appropriate network parameters. Training error and condition performance are considered at the same time. Although the above online forecast methods can get local optimal model, they all depend on the original definition, including the maximum hidden node number and the ultimate iteration steps. If the overfitting and underfitting cannot be got rid of timely, generalization performance will become from bad to worse. It will lead to the long-term prediction model out of work.
In this paper, a novel dynamic ELM with balanced variance and bias (VB-ELM) has been presented. Expanded the traditional OS-ELM, the proposed method can acquire the beneficial parameters quantificationally. Relying on the degree of overfitting or underfitting, VB-ELM structure can be optimized automatically. Also, to reduce the approximation error, the spatial-temporal features are considered in the update process. Note that both model selection and constrained optimization have an impact on fitting degree, the proposed method deal with hidden node number and regularization parameter simultaneously. The whole forecast process can be done without manual trial and error operation, which have significant application value for long-term online prediction.
The remainder of this paper is organized as follows. In Sect. 2, we describe the preliminaries involving EM-ELM and approximation error theory. In Sect. 3, a new ELM using balanced variance and bias is introduced. Experimental results and discussion are conducted in Sect. 4. Finally, the conclusion is given in the last section.
2 Preliminaries
ELM has low computational complexity and easy hardware implementation [26]. It makes researchers come up with improved ELMs in the special applications, especially for big data processing and high dimensional prediction [27]. In this section, the preliminaries of the EM-ELM and the relationship between approximation error and hidden node number are described.
2.1 Brief on EM-ELM
EM-ELM is a modified ELM which adds hidden nodes with different size. With growth of the neural network, the output weight is updated owing to error minimization [28]. Suppose that the sample data is \(\{ (\mathbf{{x}}_i ,y_i )\} _{i = 1}^N\), then the basic mapping relation can be expressed as
where \(\phi (\cdot )\) is activation function, L is the number of hidden node, \(\mathbf{{a}}_p\) and \(\mathbf{{b}}_p\) are the input weight vector and offset, respectively. The only goal to confirm network structure is to determine the output learning parameters \({\varvec{\beta }}\). Therefore, Eq. (1) can be rewritten in a more concise expression as
where \(\mathbf{{H}} = \left[ {\begin{array}{lll} {\phi (\mathbf{{a}}_1 \cdot \mathbf{{x}}_1 + b_1 )} &{} \ldots &{} {\phi (\mathbf{{a}}_L \cdot \mathbf{{x}}_1 + b_L )} \\ \vdots &{} \ldots &{} \vdots \\ {\phi (\mathbf{{a}}_1 \cdot \mathbf{{x}}_N + b_1 )} &{} \ldots &{} {\phi (\mathbf{{a}}_L \cdot \mathbf{{x}}_N + b_L )} \\ \end{array}} \right] _{N \times L}\), \({\varvec{\beta }} = [\beta _1 ,\beta _2 ,\ldots ,\beta _L ]^T\) and \(\mathbf{{Y}} = \left[ {y_1 ,y_2 ,\ldots ,y_N } \right] ^T\).
Here \(\mathbf{{H}}\) denotes the hidden-layer output matrix of this network. According to the least approximation error, the output weights can be determined as follows
where \(\mathbf{{H}}^{\dag }\) is the Moore–Penrose generalized inverse of \(\mathbf{{H}}\), which is
For EM-ELM, the target error \(\varepsilon \) and the initial hidden node number \(L_0\) are predefined. Firstly, the output weight matrix can be calculated by \({\varvec{\beta }}_1 = \mathbf{{H}}_{1}^{\dag } \mathbf{{Y}}\). If the approximation error is greater than \(\varepsilon \), new hidden node is added. \(\delta \mathbf{{H}}_1\) denotes hidden output matrix for the new node, and generalized inverse matrix is calculated as
By substituting matrix \( \mathbf{{A}} = \left[ {\begin{array}{cc} {\mathbf{{A}}_{11} } &{}\quad {\mathbf{{A}}_{12} } \\ {\mathbf{{A}}_{21} } &{}\quad {\mathbf{{A}}_{22} } \\ \end{array}} \right] = \left( {\left[ {\begin{array}{c} {\mathbf{{H}}_1^{\mathrm{T}} } \\ {\delta \mathbf{{H}}_1^{\mathrm{T}} } \\ \end{array}}\right] ~\left[ {\mathbf{{H}}_1 ,\delta \mathbf{{H}}_1 } \right] } \right) ^{ - 1}\), the \(\mathbf{{H}}_2^\dag \) can be expressed as following block matrix
Based on the Schur complement, the \(\mathbf{{H}}_2^{\dag }\) can be evolved as
The output weight can be calculated as
Similarly, the \((k+1)\)th output weight are updated as follows
where \(\mathbf{{D}}_k = \left( {\left( {\mathbf{{I}} - \mathbf{{H}}_k \mathbf{{H}}_k^\dag } \right) \delta \mathbf{{H}}_1 } \right) ^\dag \) and \(\mathbf{{U}}_k = \mathbf{{H}}_k^\dag \left( {\mathbf{{I}} - \delta \mathbf{{H}}_k^{\mathrm{T}} \mathbf{{D}}_k } \right) \). The mapping function can be defined by various forms, such as Sigmoid function and Gaussian function. They all make ELM well-adapted.
2.2 Approximation Error Theory
ELM can calculate output weight analytically. With the hidden node growing, training error will keep on decreasing. Considering the original ELM frame, the learning target is minimizing the norm of approximation error as follow
Because \({\varvec{\beta }}^*\) is calculated by the Moore–Penrose generalized inverse, it satisfies the necessary condition of minimum norm least-squares solution, namely \(\left\| {{\varvec{\beta }}^*} \right\| _n \le \left\| {\varvec{\beta }} \right\| _n\) [29]. The relationship of approximation error can be written as follows
where \(\left\| \cdot \right\| _n\) and \(\left\| \cdot \right\| _m\) is the norm of \(R^n\), \(R^m\). The \(\left\{ \cdot \right\} \) is the least square solution of the \(\mathbf{{H}}{\varvec{\beta }} =\mathbf{{Y}}\). When hidden nodes number from \(L_1\) to \(L_2\), the new error is smaller or equal to that of the previous ELM. The proof process can be indicated as follows.
Because \({\varvec{\beta }}_{L_2}\) is the solution of M–P generalized inverse, it has the minimum norm. Due to the \(\left[ {{\varvec{\beta }}_{L_1 } ,0} \right] ^{\mathrm{T}}\) may be not the minimum norm least squares solution, so the in equation in formula (13) exists. With the hidden node number becomes greater, the decreasing approximation error leads to overfitting, so learning target must be chosen again.
3 The Proposed Method
In this section, we introduce the proposed VB-ELM in detail. The proposed algorithm includes two phases: initialization learning and model update. At initial learning step, only a small portion of training data is used. Then at model update phase, the hidden node number and the regularization parameter can adjust to the fitting degree. Model parameters are confirmed according to the arriving data.
3.1 Initialization Learning Phase
First of all, it is necessary to obtain the original model. Considering the initial \(t_0\) sample chunk \({ Set}_0 = \{(\mathbf{{x}}_i,y_i )\} _{i = 1}^{t_0 \times M}\), \(t_0\)-fold cross validation is employed in the training process. t0-fold cross validation is employed in the training process. The whole training data is divided into \(t_{0}\) part. Each part is selected as the test dataset successively and the other \(t_{0}-1\) parts are corresponding training datasets. \(C_0\) is regularization parameter, which can be confirmed according to the error of \(t_0\)-fold cross validation. Mathematically, for the first fold, the fitting problem turns to be an optimization process as follows
where M is the length of each sample, \({\varvec{\beta }}_0\) is the output weight, and \(\mathbf{{h}}\left( {\mathbf{{x}}_i } \right) \) is the mapping vector of hidden layer. What is more, \(y_i\) is the expected output and \(\xi _i\) expresses predicted error. According to the KKT theorem [30], the solution is described as follows
where \(L_0\) is the hidden node number at initial learning step, and
Compared with the errors at \(t_{0}\)-fold cross validation, \({\bar{\varvec{\beta }}}_0\) with minimum error is chosen as the forecast model. The average validation error plays a role of tolerant threshold. Thinking about the test sample chunk \(\mathbf{{X}}_1\), the generalization error can be decomposed as [31]
where \({ Bias}_1^2 (\mathbf{{X}}_1 )\) is the square of predicted bias, \({ Var}_1 (\mathbf{{X}}_1 )\) is the variance, and \(\varepsilon \) represents the noise. Usually, \(\varepsilon \) is Gaussian White Noise, so the mean value can be regarded as zero. After the test processing, bias and variance in initialization learning phase is shown as
where \(\mathbf{{H}}_1^v {\varvec{\beta }}_1^v\) is the predicted value in vth fold. When the (\(t_0+1\))th sample chunk arrived, the training data turn to \(t_1\) fold.
3.2 Model Update Phase
In model update phase, hyper-parameters are determined according to the prior chunk. The changing degree of bias or variance decides fitting deviation [8]. In general, bias and variance are opposite. It can be called bias-variance dilemma. The relationship of generalization error, bias and variance is shown in Fig. 1 [32].
From Fig. 1, we can see that bias is the primary element for generalization error when the fitting ability is insufficient. Then overfitting or underfitting phenomenon is able to be judged as follows
where \({ Bias}_k = \left\| {{ Bias}_k (\mathbf{{X}}_k )} \right\| \) and \({ Var}_k = \left\| {{ Var}_k (\mathbf{{X}}_k )} \right\| \) are the mean value of the kth test chunk. \({ Bias}_{k - 1}\) and \({ Var}_{k - 1}\) are mean bias and variance of the (k-1)th test chunk, respectively. If \(H_1\) event is established, overfitting occurs. Otherwise, if \(H_2\) event is established, underfitting exists. According to the paper [33], fitting degree p is considered as follows
where \({ Bias}_{k,\max }^v\) is the least error of vth validation fold, \({ Bias}_k\) is the test error at kth chuck, and \({\hat{\gamma }}_{k}\) is the no-information error rate in current phase. \({ Bias}_{k,\max }^v\) is calculated as follows
\({\hat{\gamma }}_{k}\) describes the least error rate in the known dataset, which is estimated as
It is obvious that the range of p is from zero to one. When the value is zero, the model and new data fit each other. If p rises one, that means the model has a large deviation. On the basis of p, the regularization parameter threshold for (\(k+1\))th chunk moves as follows
where \(C_k\) is the current value. \(\left\lfloor \cdot \right\rfloor \) and \(\left\lceil \cdot \right\rceil \) describe an integer which are rounded down and up. The threshold of hidden node number is changed with the following formula
where \(L_k\) is the current threshold value. Then the joint optimal solution is got by PSO method as follows
where V and P express velocity and position. n is the iteration number. \(B_{L_i ,C_j }^n\) is the nth personal best particle. \(B_{L_g ,C_j }\) is group optimal location. \(L_{\min } = \min \left\{ {L_{k + 1}^{{ bound}} ,L_k } \right\} \), \(L_{\max } = \max \left\{ {L_{k + 1}^{{ bound}} ,L_k } \right\} \), \(C_{\min } = \min \left\{ {C_{k + 1}^{{ bound}} ,C_k } \right\} \), and \(C_{\max } = \max \left\{ {C_{k + 1}^{{ bound}} ,C_k } \right\} \). \(r_1\) and \(r_2\) are uniform random numbers at \(\left[ {0,1} \right] \). \(c_1\) and \(c_2\) are positive constants. The fitness function is calculated as follows
The update formula for getting the best parameter group is shown as follows
After acquired the optimal fitness, the corresponding particles is \(L_i ^\prime \) and \(C_i ^\prime \). We can confirm \(L_{k + 1} = L_i ^\prime \) and \(C_{k + 1} = C_i ^\prime \). Finally, the solution is described as follows
The method alters the parameters according to the fitting degree. The hidden nodes need not be tuned one by one. So VB-ELM is better suited for long-term online application. Take into account the learning procedure characteristic, the complete VB-ELM algorithm is presented in Algorithm 1.

Remark A
Regarding the main computational complexity, we express the node number of one layer as L. The particle range is indicated by M. Considering iteration number, the proposed method requires \(O\left( {N_k L_E^3 + N_k^{2}nM} \right) \) multiplications, including machine learning process and model update process. Due to the PSO method using optimum parameters, the computational complexity of the proposed algorithm is obviously greater than ELM. But it is worth noting that the proposed algorithm doesnt rely on trial and error, and long-term automatic prediction model has been achieved.
4 Experimental Results and Analysis
This section describes the experiment. Firstly, we list the datasets and evaluation criteria. Then, the results of the different methodologies are compared. VB-ELM is compared with other online regression methods. The parameter analysis is shown in the last part.
4.1 Datasets Description
The VB-ELM method proposed in this paper has mostly been applied to the long-term online prediction. In order to evaluate the effectiveness of the proposed method, there are four UCI datasets with different attributes to be selected [34]. Table 1 presents the specifications of each dataset.
The first dataset (Temperature) is collected from a sensor network in a house. Output concept is dining-room temperature. The second dataset (Household) measures the electric power consumption. Globally active power is the output value. Then the averaged concentration CO is the forecasting target in the third dataset (AirQuality). The fourth dataset (Energy) includes the appliances energy data. The second attribute which is called energy use is chosen as the predicted label.
After analyzing the spatial–temporal distribution, the input element is composed of two parts. One is the reference value over the last week. The other one is some related factors. All the time series must be normalized as
where s is an input numeric value, \(s_{\min }\) and \(s_{\max }\) are minimum and maximum of the selected feature data, respectively.
4.2 Evaluation Criteria
Multiple metric is required to evaluate the algorithm performance. In this paper, the generalization performances are judged in accordance with the Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Pearson Correlation Coefficient (PCC). They are shown as the following three equations.
where \(y_i\) is the ith actual value in the tth chunk,\(i = 1, \ldots ,N\). \({\hat{y}}_i\) is the ith predicted value. \({\bar{y}}_i\) and \(\overline{{\hat{y}}} _i\) are respectively the mean of the \(y_i\) and \({\hat{y}}_i\).
4.3 Experimental Results
To verify the effectiveness of the proposed algorithm, online SVM, OS-ELM, D-ELM and PSO-EM-ELM are chosen to be contrastive methods. The test values in these methods are calculated by 50 independent Monte Carlo trials. In the first experiment, the prediction result of Dataset 1 is shown. Figure 2 shows the evaluation effect of an execution round for the Temperature.
Forecasting value at the last 4 day is shown in Fig. 2a. The black line with dots describes true value, and the blue colour line describes the predicted value of proposed VB-ELM method. The two lines are well fitting. While in Fig. 2b and c, compared with the RMSE and MAPE, the proposed approach has smaller generalization error. Particularly in Fig. 2c, the RMSE is closed to zero, which is farther smaller than the other constructive methods. In Fig. 2d, the PCC of proposed method is always above the 0.95, which reflects a higher correlation and the better nonlinear mapping ability. These phenomena indicate that due to the ability of fast model predictive control, the new method is more suitable for long-term online prediction.
In the second experiment, the prediction result of Dataset 2 is shown. Figure 3 shows the evaluation value of an execution round for the Household.
As seen in Fig. 3a, although the change of data is sharp, the proposed method doesnt lose efficacy. Then in Fig. 3b and c, it is obvious that the proposed method has the lower RMSE and MAPE than the state-of-the-art method. The RMSE is under 0.2, which is less than half the other methods. In Fig. 3d, the PCC of the proposed method is near to one. This situation reflects the proposed model has good fitting ability and sound adjustment. It is because the proposed method can match the change of nonlinear feature, and the overfitting is effectively avoided.
In the third experiment, the prediction result of the Dataset 3 is shown. Figure 4 shows the evaluation value of an execution round for the AirQuality.
As seen in Fig. 4a, the proposed method remains efficient even though the sequential periodicity is blurry. In Fig. 4b and c, As time goes on, all generalization errors tend to downward, while the proposed method always keeps the least generalization error. It is because the model can handle serious overfitting phenomena. In Fig. 4b, the proposed method shows the best PCC than others. It proves that the proposed method has better capacity of adjustment. This advantage helps generate more suitable model by the arriving data.
In the last experiment, the prediction result of the Energy is shown. Figure 5 demonstrates the forecast and evaluation value.
As seen in Fig. 5a, although the adjacent data changed a lot, the proposed method still stays accurate forecasting value. In Fig. 5b, although the MAPE of the VB-ELM rises during the online learning process, the slope is smaller than the PSO-EM-ELM and the generalization performance maintain the best. In Fig. 5c, the RMSE increases firstly and then decreases. It is because the test data is discrete at times. In Fig. 5d, the proposed method has the better PPC than the contrastive method. It indicates the new dynamic ELM cannot lead to excessive learning. While the VB-ELM changed the model structure, it can control the optimal parameter, so the fluctuation of generalization performance is not violent.
For the four contrast methods, the proposed method represents satisfactory results. The test accuracy has improved significantly than Online SVM and OS-ELM. Then in general, the generalization performance is always better than D-ELM and PSO-EM-ELM. In the proposed method, the error trend is stable. It is proved that the method matches the data changes well for the long-term online prediction.
4.4 Parameter Analysis
In order to explain the validity of VB-ELM, representative parameters are analyzed in this part. For example, in the second chunk of the household dataset, the parameters analysis is shown in Fig. 6.
Firstly, it is attractive to analyze the regularization parameter C and hidden node number L. The two parameters are selected at the same time. As shown in Fig. 6a, when overfitting occurs, the result forces both C and L turn smaller. In this case, based on the PSO method, the C is invariant and the L is from 28 to 25. So, if the current chunk and the last chunk are not alike, the new parameters will support the online prediction well. Then as shown in Fig. 6b, with the growth of days, the error between the variance and the square of bias turns to zero. It is because the automatic optimization guarantees the balance of variance and bias. It reflects the foresight of the model when fitting departure happens.
In above experiments, the VB-ELM manifests the lower generalization error and the higher correlation. Furnished with the reliable fitting degree, the model can fit the changing overfitting and underfitting. Also, considering the decomposition of generalization error, adaptive parameters are beneficial to the nonlinear time series. It helps the VB-ELM minimize user intervention, so long-term online prediction can be realized fully automatically.
5 Conclusion
In this paper, an efficient VB-ELM is proposed for long-term online predication. Different from traditional online model, the proposed method determines model according to the changing test error. Update strategy not only balances the variance and bias but also gets joint optimal parameters by PSO, so the proposed method has better generalization performance. Fitting degree helps judge overfitting and underfitting quantificationally. The network reestablishment process based on user intervention is avoided. It makes the forecasting model suitable for non-expert user, and long-term online prediction process can be automatically controlled. Because of using PSO to determine the hyper-parameters, the proposed method surely spends more time than original ELM method. With the development of hardware resources, computational time in practical applications is becoming shorter. In future, the research work will look at how to predict unstructured data.
References
Golestaneh F, Pinson P, Gooi HB (2016) Very short-term nonparametric probabilistic forecasting of renewable energy generation—with application to solar energy. IEEE Trans Power Syst 31(5):3850–3863
Hu W, Yan L, Liu K et al (2016) A short-term traffic flow forecasting method based on the hybrid PSO-SVR. Neural Process Lett 43(1):155–172
Kumar P, Martani C, Morawska L et al (2016) Indoor air quality and energy management through real-time sensing in commercial buildings. Energy Build 111:145–153
Tian HX, Mao ZZ (2010) An ensemble ELM based on modified AdaBoost. RT algorithm for predicting the temperature of molten steel in ladle furnace. IEEE Trans Autom Sci Eng 7(1):73–80
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11(Jul):2079–2107
Ding S, Li Y, Zhu J et al (2015) Sequential sample consensus: a robust algorithm for video-based face recognition. IEEE Trans Circuits Syst Video Technol 25(10):1586–1598
Trillos NG, Murray R (2017) A new analytical approach to consistency and overfitting in regularized empirical risk minimization. Eur J Appl Math 28(6):886–921
Srivastava N, Hinton GE, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Richards SA, Whittingham MJ, Stephens PA (2011) Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework. Behav Ecol Sociobiol 65(1):77–89
Huang G, Huang GB, Song S et al (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48
Liang NY, Huang GB, Saratchandran P et al (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423
Liu D, Wu YX, Jiang H (2016) FP-ELM: An online sequential learning algorithm for dealing with concept drift. Neurocomputing 207:322–334
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062
Feng G, Huang GB, Lin Q et al (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357
Cao J, Lin Z, Huang G-B (2012) Self-adaptive evolutionary extreme learning machine. Neural Process Lett 36(3):285–305
Bai Z, Huang GB, Wang D et al (2014) Sparse extreme learning machine for classification. IEEE Trans Cybern 44(10):1858–1870
Castaño A, Fernández-Navarro F, Hervás-Martínez C (2013) PCA-ELM: a robust and pruned extreme learning machine approach based on principal component analysis. Neural Process Lett 37(3):377–392
Zhang R, Lan Y, Huang GB et al (2013) Dynamic extreme learning machine and its approximation capability. IEEE Trans Cybern 43(6):2054–2065
Zhang R, Lan Y, Huang G et al (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–371
Grigorievskiy A, Miche Y, Ventelä AM et al (2014) Long-term time series prediction using OP-ELM. Neural Netw 51:50–56
Savitha R, Suresh S, Kim HJ (2014) A meta-cognitive learning algorithm for an extreme learning machine classifier. Cogn Comput 6(2):253–263
Figueiredo EMN, Ludermir TB (2014) Investigating the use of alternative topologies on performance of the PSO-ELM. Neurocomputing 127:4–12
Han F, Zhao MR, Zhang JM et al (2017) An improved incremental constructive single-hidden-layer feedforward networks for extreme learning machine based on particle swarm optimization. Neurocomputing 228:133–142
Du KL, Swamy MNS (2016) Particle swarm optimization. In: Search and optimization by metaheuristics. Springer, pp 153–173
Han M, Zhang R, Xu M (2017) Multivariate chaotic time series prediction based on ELM-PLSR and hybrid variable selection algorithm. Neural Process Lett 46(2):705–717
Cao J, Lin Z (2015) Extreme learning machines on high dimensional and large data applications: a survey. Math Probl Eng 2015:1–13
Zhai J, Shao Q, Wang X (2016) Architecture selection of ELM networks based on sensitivity of hidden nodes. Neural Process Lett 44(2):471–489
Shao Z, Er MJ, Wang N (2016) An efficient leave-one-out cross-validation-based extreme learning machine (ELOO-ELM) with minimal user intervention. IEEE Trans Cybern 46(8):1939–1951
Wu HC (2007) The Karush–Kuhn–Tucker optimality conditions in an optimization problem with interval-valued objective function. Eur J Oper Res 176(1):46–59
Taieb SB, Atiya AF (2016) A bias and variance analysis for multistep-ahead time series forecasting. IEEE Trans Neural Netw Learn Syst 27(1):62–76
Lever J, Krzywinski M, Altman N (2016) Points of significance: model selection and overfitting. Nat Methods 13(9):703–704
Hothorn T, Lausen B (2003) Bagging tree classifiers for laser scanning images: a data-and simulation-based strategy. Artif Intell Med 27(1):65–79
Frank A, Asuncion A (2017) UCI machine learning repository. University California Irvine. http://archive.ics.uci.edu/ml
Acknowledgements
The work was supported by the National Key Research Project of China under Grant No. 2016YFB1001304, the National Natural Science Foundation of China under Grant 61572229, the JLUSTIRT High-level Innovation Team, and the Fundamental Research Funds for Central Universities under Grant No. 2017TD-19. The authors gratefully acknowledge financial support from the Research Centre for Intelligent Signal Identification and Equipment, Jilin Province.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yu, H., Sun, X. & Wang, J. A Dynamic ELM with Balanced Variance and Bias for Long-Term Online Prediction. Neural Process Lett 49, 1257–1271 (2019). https://doi.org/10.1007/s11063-018-9865-x
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11063-018-9865-x







