cost forecasting
Physica A 493 (2018) 239–252
Contents lists available at ScienceDirect
Physica A
journal homepage: www.elsevier.com/locate/physa
Forecasting Construction Cost Index based on visibility graph: A network approach Rong Zhang a, Baabak Ashuri b, Yu Shyr c, Yong Deng a,d,e,* a School of Computer and Information Science, Southwest University, Chongqing, 400715, China b Construction Research Center (CRC) and Economics of the Sustainable Built Environment (ESBE) Lab, School of Building Construction/School of Civil & Environmental Engineering, Georgia Institute of Technology, 280 Ferst Drive, Atlanta, GA, 30332-0680, USA c Center for Quantitative Sciences, Vanderbilt University, School of Medicine, Nashville, TN 37232, USA d Institute of Fundamental and Frontier Science, University of Electronic Science and Technology of China, Chengdu, 610054, China e Big Data Decision Institute, Jinan University, Guangzhou, Guangdong, 510632, China
h i g h l i g h t s
• To forecast Construction Cost Index, time series is firstly converted into a graph. • Future data is predicted by similarities between nodes based on link prediction. • The application shows the method is able to provide more accurate estimations. • It is superior in forecasting both short-term trend and long-term trend.
a r t i c l e i n f o
Article history: Received 17 July 2017 Received in revised form 2 October 2017 Available online 4 November 2017
Keywords: Forecasting Time series Visibility graph Link prediction CCI
a b s t r a c t
Engineering News-Record (ENR), a professional magazine in the field of global construction engineering, publishes Construction Cost Index (CCI) every month. Cost estimators and contractors assess projects, arrange budgets and prepare bids by forecasting CCI. However, fluctuations and uncertainties of CCI cause irrational estimations now and then. This paper aims at achieving more accurate predictions of CCI based on a network approach in which time series is firstly converted into a visibility graph and future values are forecasted relied on link prediction. According to the experimental results, the proposed method shows satisfactory performance since the error measures are acceptable. Compared with other methods, the proposed method is easier to implement and is able to forecast CCI with less errors. It is convinced that the proposed method is efficient to provide considerably accurate CCI predictions, which will make contributions to the construction engineering by assisting individuals and organizations in reducing costs and making project schedules.
© 2017 Elsevier B.V. All rights reserved.
1. Introduction
Engineering News-Record (ENR) is a professional magazine in the field of global engineering construction. Each month, ENR publishes Construction Cost Index (CCI) that is widely used in the construction industry. CCI is the weighted aggregate of average prices of constant quantities of common labor, standard structural steel, portland cement and lumber in 20 cities in the United States (refer to [1]). To some extent, CCI increases smoothly year by year. However, it has quite fluctuations
* Corresponding author at: School of Computer and Information Science, Southwest University, Chongqing, 400715, China. E-mail address: [email protected] (Y. Deng).
https://doi.org/10.1016/j.physa.2017.10.052 0378-4371/© 2017 Elsevier B.V. All rights reserved.
240 R. Zhang et al. / Physica A 493 (2018) 239–252
in many years. These variations usually cause inaccurate estimations, which may lead to much capital loss. As accuracy is significant in preparing bids, cost, and budgets in most of projects, cost estimators and project managers show great interest in more accurate CCI predictions.
Until now, people have made much efforts to make predictions in many areas, such as biological mechanism [2], disease propagation [3] and dynamic systems [4]. Generally, two different strategies are widely used (refer to [5]). One is casual theory and the other is time series. The former argues that cause determines effect and the future variable is predicted by explanatory variables. The general concepts include augmented Dickey–Fuller (ADF), Granger causality, cointegration and so on (see [6,7] for a survey). While the latter holds the view that future values are dependent on past values and corresponding errors, where exponential smoothing (ES), autoregressive and moving average (ARMA), autoregressive integrated moving average (ARIMA), and seasonal ARIMA are all well-known and widely-used methods in forecasting time series (refer to [8,9]). Besides, many novel methods are proposed to make predictions, which brings much convenience to practical applications, such as grey system [10] and artificial intelligence models [11]. It should be noted that fuzzy sets and evidence theory are applied to time series prediction since these tools are efficient to deal with uncertain information [12,13].
Previously, approaches of forecasting CCI have been investigated a lot. Ashuri et al. studied many time series models, such as simple moving average(SMA), Holt ES, ARIMA and seasonal ARIMA [14]. These univariate time series models do have impact on forecasting CCI as the errors are acceptable. It is also implied that different forecasting methods are suitable for specific objects. For instance, the seasonal ARIMA model is proved to be best fit for one-step-ahead prediction of CCI, but multi-step-ahead prediction of CCI is best modeled by Holt-winter ES. Then the authors further conducted empirical tests to identify leading indicators of CCI [15]. Based on unit root test, Granger causality test and Johansen cointegration test, the authors demonstrated that there are eight leading indicators, of which money supply and crude oil price have long-term relationships with CCI. Since proper explanatory variables have an effect on the ability to forecast CCI, multivariate time series models were proposed to improve the accuracy [16]. Owing to five vector error correction models (VECM), the authors proved that the bivariate VECM with CCI and producer price index (PPI) is the most accurate approach, which successfully provided better predictions. Apart from that, based on linear regression (LR) models, Seokyon Hwang proposed Dynamic Regression (DR) models to predict construction costs [17]. The results show that the proposed DR-4 and DR-5 perform better than ES and Box–Jenkins. To understand the trend of construction costs, Seokyon Hwang also studied ARMA models and multivariate autoregressive (VAR) models and found that ARMA(5,5) and VAR(12) can predict accurately in the range of 24 months [18].
Even though the referred work is of great value, there are still some defects that prevent the methods from forecasting more accurately. Considering that CCI is an index about construction cost, it is related to many factors, such as society, market, economy and different regions. These have become the difficulties in forecasting CCI. In 1994, the first examinations done by Williams et al. were suggestive of arduous efforts in forecasting CCI [19]. Besides, taking benefits and interests into account, more accurate predictions and more simple implementations are always in demand. For example, in Ref. [20], the authors conducted contemporaneous time series and improved the accuracy of productivity prediction, which is crucial and significant for time and cost estimators in projects.
In recent years, complex networks have captured much attention. Findings of complex networks wildly used, such as small-world characteristics [21] and intelligent systems [22,23]. For instance, based on multilayer networks and multiplex networks, Wang et al. studied evolutionary games as a more appropriate description of social system [24] and put forward two kinds of acquaintance immunization strategies [25]. In addition, many researches show that networks are also capable of making predictions via proper analysis, (refer to [26,27]). These indicated that there exist relationships between networks and time series. A efficient tool called visibility graph was proposed by Lacasa et al. [28]. According to their algorithm, time series can be used to generate a network. Furthermore, Lan et al. developed a faster transformation and successfully reduced the complexity of the visibility graph algorithm [29]. Basing on network analysis, visibility graph is able to determine the weights of Ordered Weighted Averaging (OWA) operator [30,31], which is very important to make decisions under uncertain environment [32]. Besides, Li et al. proposed a new probability transformation method based on visibility graph, which shows significant relationships between probability and time series [33]. From this point of view, time series can be studied as a network where many theories in complex networks can be applied. Nevertheless, in order to forecast time series, it is pointed out that difficulties in link prediction of networks require to be overcome urgently (refer to [34,35]). A predominant approach for link prediction based on local random walk was proposed by Liu et al. [36]. In their strategy of link prediction, similarity between nodes is defined, which is the probability that two nodes will be connected. In Ref. [37], the author explained the contributions of link prediction physically and introduced many useful applications. Resorting to visibility graph and link prediction, the authors defined node similarity and made predictions according to state of link [38]. Thus, time series and networks are closely connected by probability which makes prediction more simplified.
To address the issues mentioned before, this paper is aimed at improving the precision of CCI predictions as well as the efficiency of the method. Specifically, time series is converted into a network by visibility graph algorithm. Then, the similarities between the nodes will be computed and an edge will be generated between the nodes with the highest similarity. After that, the two newly-linked nodes are used to determine the future node. Finally, the network is converted back into time series and the value of future node is estimated.
This paper is organized as follows: In Section 2, some preliminaries about visibility graph and link prediction will be introduced. The proposed method will be detailed in Section 3. And the predictability of the proposed method will be discussed in Section 4. A brief conclusion is given in Section 5.
R. Zhang et al. / Physica A 493 (2018) 239–252 241
Fig. 1. Illustration of visibility algorithm: Time series {(t1, y1), (t2, y2), . . . , (t8, y8)} is plotted in the upper histogram. y axis represents the corresponding values yi at time point ti. The converted visibility graph is pictured below.
2. Preliminaries
In this section, the theories regarding visibility graph and link prediction, which are referred in the proposed method, will be introduced.
2.1. Visibility graph
The visibility graph is a network constructed by time series through the visibility algorithm [39,40]. It becomes the bond between time series and complex networks and provides a new way to solve many problems (refer to [41–43]). In the visibility graph, time sequences and real values are replaced by nodes. According to [28], the nodes are connected by links according to the visibility algorithm, which is described specifically as follows:
Two arbitrary data values (ta, ya) and (tb, yb) will have visibility, and consequently will become two connected nodes of the associated graph, if any other data (tc , yc) placed between them fulfills:
yc < yb + (ya − yb) tb − tc tb − ta
(1)
For example, Fig. 1 plots a time series with 8 data points. In the upper histogram, every bar is considered as a node. Two bars are linked only if they satisfy the visibility criterion (Eq. (1)). Apparently, there exists a link between a node and its nearest neighbors.
2.2. Link prediction
Since time series has been converted into a graph, forecasting is then associated with the graph. Link prediction, which is applied in the graph, is able to generate potential links (refer to [37,44,45]). In other words, it predicts future development of the graph by adding links between certain nodes. Based on local random walk, a competitive prediction method was proposed by Liu et al. [36]. The similarity between two nodes is defined so that two nodes with higher similarity are more possible to be connected in the future. According to the authors, dealing with link prediction is detailed as follows.
A visibility graph is represented by an undirected network G(V , E), where V is the set of nodes and E is the set of links. Multiple links and self-connections are not allowed. For two arbitrary nodes x, y ∈ V , the similarity between them is denoted by a score Sxy.
First and foremost, the random walk process is introduced. It is suggested that random walk can be regarded as a Markov chain which describes the sequence of nodes visited by a random walker in network (refer to [46–49]). Namely, it can be represented by the transition probability that a random walker, who is staying at node x, will walk to node y in the next step. It is denoted by matrix P and is obtained by Eq. (2).
Pxy = axy kx
, (2)
242 R. Zhang et al. / Physica A 493 (2018) 239–252
where axy = 1 if node x is connected to node y, otherwise axy = 0. kx is the degree of node x. Thus, the probability that a random walker departs from node x and arrives at node y after t steps is obtained by Eq. (3).
π⃗x(t) = P T π⃗x(t − 1), (3)
where π̃x(0) is an N × 1 vector with the xth element equaling to 1 and others equaling to 0. Eq. (4) defines the similarity between node x and node y (refer to [36]).
SLRWxy (t) = kx
2 |E| • πxy(t) +
ky 2 |E|
• πyx(t), (4)
where |E| is the number of links in the network and LRW is the abbreviation of local random walk. As is implied in the equation, the similarity is an extent that is concerned with node degree, number of edges and probability, which quantifies the relationship between node x and node y. In addition, two points deserve to be noted. In the first place, t here means the time that a random walker spends in walking in the network, which should be distinguished from the sequence in time series. Secondly, it has Sxy = Syx evidently. However, when walking from x to y, it is probable that a random walker will go too far away from both x and y even though x is adjacent to y, which may result in unfavorable predictions. To guarantee that a random walker walks in a local part rather than other parts of the network, a higher similarity is obtained by Eq. (5) (refer to [36]).
SSRWxy (t) = t∑
l=1
SLRWxy (l) (5)
Because the contribution of each walk is superposed, SRW is the abbreviation of superposed random walk. In Ref. [36], the authors also proved that the similarity defined by Eqs. (4) and (5) is efficient by comparing with other methods on five real networks.
3. Proposed method
By utilizing visibility graph and link prediction, a new method is proposed to forecast construction cost index (CCI). The proposed method will be divided into two parts. In Section 3.1, a model of one-step-ahead prediction is illustrated. It is used to forecast the value at next one time point, which is able to provide estimations in a short term. In Section 3.2, a model of multi-step-ahead prediction is illustrated. It is designed to forecast the values at next time period, which is able to provide long-term estimations. Both of them are detailed step by step as follows.
3.1. One-step-ahead prediction
Step 1: Input data The input data includes training set U and testing set V: U = {(t1, y1), (t2, y2), . . . , (ti, yi), . . . , (tn, yn)} V = {(tn+1, yn+1), (tn+2, yn+2), . . . , (tN , yN)}, where U represents the data used in the model and V represents the data needed to be forecasted. yi represents the CCI
value at the time point ti, where ti = 1, 2, 3, . . . , N and N is the total number of CCI. Step 2: Construct time series network Utilizing the visibility graph algorithm mentioned in Section 2.1, U is converted into a network. Step 3: Link prediction based on random walk Based on Eqs. (3)–(5), the similarities between the last node n and other nodes are calculated. The node with the highest
similarity is denoted by (t∗n , y ∗ n).
Step 4: Return to time series Add the future node to the network with an edge linked to the last node, as Fig. 2 shows. The time series (tn+1, ŷn+1) is
forecasted by Eq. (6).
ŷn+1 = yn − y∗n tn − t∗n
(tn+1 − tn) + yn, (6)
where ŷn+1 is the predicted value of CCI at time point tn+1. Step 5: Add (tn+1, yn+1) in V to U and go to Step 2 until all data in V is forecasted. The whole process can be summarized in a flow chart as is shown in Fig. 4. To be clear, an example is introduced.
The input data is listed in Table 1. According to Step 1, data in Table 1 is input in the model, i.e. time series U = {(1, 4680), (2, 4685), . . . , (10, 4771)}, V = {(11, 4787), (12, 4777), . . . , (20, 4892)}. Afterwards, the time series is con- verted into the visibility graph in Step 2. Next, in Step 3, the similarities between the 10-th node and previous 9 nodes are calculated, as is presented in Table 1. Obviously, the 6-th node shows the highest similarity. Thus, it is linked to the 10-th node, as is indicated by the red dotted line in Fig. 3. Then, based on Eq. (6) in Step 4, the forecasted CCI in November 1990 (t11, ŷ11) is obtained, as is indicated by the solid green line in Fig. 3. Finally add (t11, y11) = (11, 4787) in V to time series U so as to get the forecasted CCI in December 1990. Repeat Step 1 to Step 4 until the forecasted CCI in August 1991 is obtained. The results are also shown in Table 1.
R. Zhang et al. / Physica A 493 (2018) 239–252 243
Table 1 Data set and results of the example.
Node Time CCI One-step-ahead Multi-step-ahead
Testing set U Node similarity 1 Jan-90 4680 1.2152 1.0816 2 Feb-90 4685 1.2152 1.0816 3 Mar-90 4691 1.7228 1.5332 4 Apr-90 4693 0.4597 0.4089 5 May-90 4707 1.1978 1.0656 6 Jun-90 4732 3.0353 2.7014 7 Jul-90 4734 0.4764 0.4242 8 Aug-90 4752 0.4764 0.4242 9 Sep-90 4774 2.4998 2.2295 10 Oct-90 4771 0.3206 Testing set V Forecasted CCI 11 Nov-90 4787 4780.75 4802.00 12 Dec-90 4777 4793.50 4816.00 13 Jan-91 4777 4778.00 4829.89 14 Feb-91 4773 4777.75 4843.06 15 Mar-91 4772 4772.80 4855.60 16 Apr-91 4766 4771.67 4868.15 17 May-91 4801 4764.86 4880.69 18 Jun-91 4818 4803.33 4893.23 19 Jul-91 4854 4822.43 4905.78 20 Aug-91 4892 4862.38 4918.32
Fig. 2. Illustration of adding a new node: The new node is added to the network by linking to the last node as the red dotted line shows. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3. Link prediction of one-step-ahead-prediction: The 6-th node is linked to the 10-th node by the red dotted line. The green solid line means that the 11-th node is determined by the 6-th and 10-th node together. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
3.2. Multi-step-ahead prediction
Step 1: Input data The input data only includes training set U. Step 2: Construct time series network Utilizing the visibility graph algorithm mentioned in Section 2.1, U is converted into a network. Step 3: Link prediction based on random walk
244 R. Zhang et al. / Physica A 493 (2018) 239–252
Fig. 4. The flow chart of one-step-ahead prediction: i is the number of loops, from 0 to (N − n). And (N − n) is the number of CCI requiring to be forecasted.
Add the future node to the network according to Fig. 2. In the newly-generated network, the similarities between the future node and other nodes are calculated by Eqs. (3)–(5). The node with the highest similarity is denoted by (tj, yj).
Step 4: Return to time series Find out the node (tj∗, yj∗) that satisfies: 1) (tj∗, yj∗) is on the right side of the node (tj, yj); and 2) (tj∗, yj∗) is the last one
that node (tj, yj) is linked to. Hence, the predicted value (tn+1, ŷn+1) is calculated by Eq. (7).
ŷn+1 = y∗j − yj t∗j − tj
(tn+1 − tj) + yj (7)
Step 5: Add (tn+1, ŷn+1) to U and go to Step 2 until the required CCIs are forecasted. The process is illustrated by the flow chart in Fig. 5. In Table 1, CCIs in training set U are used to estimated the following 10
CCIs, under the assumption that we do not know CCIs in testing set V . Likewise, data in training set is denoted by time series U = {(1, 4680), (2, 4685), . . . , (10, 4771)} in Step 1. Then it is converted into a network in Step 2. However, in Step 3, the similarities between the 11-th and previous 10 nodes are computed, as is presented in Table 1. It is obvious that the 6-th node is with the highest similarity. As is implied by the red dotted line in Fig. 6, the 9-th node is the last one that is linked to 6-th node on the right side. Afterwards, in Step 4, as is illustrated by the green solid line in Fig. 6, the forecasted CCI in November 1990 is obtained by Eq. (7), which is denoted by (t11, ŷ11). It is worth noticing that (t11, ŷ11) = (11, 4802) will be added to time series U in order to forecast CCI in December 1990. Finally, Step 1 to Step 4 are repeated until (t20, ŷ20) = (20, 4918.32) is obtained. The results are also shown in Table 1.
3.3. Discussions
It is worth noting the following points of the proposed method.
(1) In both models of one-step-ahead and multi-step-ahead prediction, time series is converted into a visibility graph at first, because visibility graph not only maintains the numerical properties of original data but also shows data in a geometrical form.
(2) Based on the results of random walk process, both the two models calculate the similarity between nodes, since random walk process well associates the links with nodes and similarity well characterizes the close relationship between two nodes. So it is reasonable to consider similarity as the possibility that there will be a link between two nodes. In the model of one-step-ahead prediction, the similarities between last node and previous nodes are calculated, because the data constructing the visibility graph are all actual values and the last node is the nearest node linked to the future node. However, in the model of multi-step-ahead prediction, the time series in the visibility graph include forecasted values, so the similarities between future node and previous nodes are calculated. And the node with highest similarity is linked to the last node that it has visibility with on the right side, since the last node in the visibility is a forecasted value instead of an actual value.
R. Zhang et al. / Physica A 493 (2018) 239–252 245
Fig. 5. The flow chart of multi-step-ahead prediction: K represents the number of CCI requiring to be forecasted. i is the number of loops, from 1 to K.
Fig. 6. Link prediction of multi-step-ahead prediction: The 6-th node with the highest similarity is linked to the 9-th node by the red dotted line. And they are used to determine the 11-th node by the green solid line. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
(3) In the proposed method, the visibility graph is used as a tool and link prediction is link prediction provides a way of calculating the node similarity. The novelty of the proposed method lies in returning visibility graph to time series, as is indicated in Eqs. (6) and (7). As a result, different from other methods, the proposed method is based on a network approach.
4. Result and analysis
In this section, the proposed method will be applied to forecast real CCI data in several experiments. The values of CCIs from January 1990 to July 2014 are plotted in Fig. 7, which is taken as universal set of the experiments. It is evident that CCIs indeed increased over the long time. But as is exampled in Fig. 8, it decreased in quite a few months, which may result in bad estimations. To test the proposed method in a relatively complete way, the subsets used in the experiments will be selected from the universal set according to the following criterions:
246 R. Zhang et al. / Physica A 493 (2018) 239–252
Fig. 7. Increasing trend of CCI time series data in the long run.
Fig. 8. Variations of CCI time series data in some periods.
(1) Sufficient: The subsets should be sufficient to implement the model and be sufficient to represent the real situations. (2) Stochastic: The subsets should be selected from the universal set as randomly as possible.
Besides, three error measures will be adopted to evaluate the predictability, including mean absolute percentage error (MAPE), mean square error (MSE) and mean absolute error (MAE). These errors are calculated by the following equations (refer to [50–52]):
MAPE = 1 N
N∑ t=1
⏐⏐⏐Ŷ(t) − Ỹ(t)⏐⏐⏐ Ỹ(t)
× 100 (8)
MSE = 1 N
N∑ t=1
[ Ŷ(t) − Ỹ(t)
]2 (9)
MAE = 1 N
N∑ t=1
⏐⏐⏐Ŷ(t) − Ỹ(t)⏐⏐⏐ , (10) where Ŷ(t) represents the forecasted CCI and Ỹ(t) is the actual CCI correspondingly. As more accurate data is required, it is obvious that low MAPE, MSE or MAE shows the high predictability of the method. After the proposed method is implemented, three error measures will be computed and be compared with previous achievements.
4.1. Predictability of one-step-ahead prediction
To show the predictability of one-step-ahead prediction, the whole universal data set will be used. Namely, CCIs from January 1990 to June 2014 are used to forecast CCIs from March 1990 to July 2014. For one thing, at least 2 CCI data should
R. Zhang et al. / Physica A 493 (2018) 239–252 247
Table 2 Errors of one-step-ahead CCI prediction.
Forecasting method MAPE MSE MAE
The proposed method 0.29% 860.01 20.05 Simple moving average (SMA)[14] 0.40% 548.75 19.16 Seasonal ARIMA [14] 0.34% 548.75 16.67
Fig. 9. Data input for one-step-ahead prediction. The last CCI is predicted by CCIs from January 1990 to June 2014 as the green dotted arrow points. And real CCIs from March 1990 to July 2014 compose data set V. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10. The predicted and true CCIs from year 2000 to 2001.
be input in the model to make one-step-ahead predictions. For another thing, choosing the whole data is able to include all the situations where subsets are selected randomly. As is illustrated in Fig. 9, the experiment starts from CCI in January 1990 and February 1990, where CCI in March 1990 is forecasted. The simulation is repeated until CCI in July 2014 is forecasted.
After inputting the required CCI data, the predicted CCIs are obtained by using the proposed method, together with the forecasting errors presented in Table 2. As can be seen, the errors have dropped when comparing with the models, i.e. SMA and seasonal ARIMA, referred in [14]. Especially, the MAPE has fallen by 0.11% and 0.05% respectively. To give an example, Fig. 10 pictured both predicted and true CCIs from January 2000 to December in 2001. As is shown in the figure, the blue solid curve (forecasted values) well fits the red dotted curve (true values) to a large degree. Therefore, the predictability of one-step-ahead prediction is proved to be accurate.
4.2. Predictability of multi-step-ahead prediction
To demonstrate the predictability of multi-step-ahead prediction, the test is divided into two parts: short-term trend and long-term trend. Details are as follows.
248 R. Zhang et al. / Physica A 493 (2018) 239–252
Table 3 Data input for multi-step-ahead prediction.
Categories Cases Input U Forecasted values
Short-term trend Scenario 1 1990.01–2010.12 2011.12 Scenario 2 1990.01–2011.12 2012.12 Scenario 3 1990.01–2012.12 2013.12
Long-term trend Group 1 1990.01–1998.12 1999.01–2001.12 Group 2 1990.01–1999.12 2000.01–2002.12 Group 3 1990.01–2000.12 2001.01–2003.12
Table 4 Errors of short-term trend forecasting.
Forecasting method Annotation MAPE
The proposed method Scenario 1 0.55% Scenario 2 0.49% Scenario 3 0.66% Average 0.57%
Seasonal ARIMA Ref. [14] 1.06%
(1) Short-term Trend Short-term trend refers to forecasting the next year December’s CCI. To satisfy the randomness, 3 scenarios are conducted. Specifically, Table 3 presents the process of data input. It shows that Scenario 1 uses CCIs from January 1990 to December 2010. Scenario 2 uses CCIs from January 1990 to December 2011. And Scenario 3 uses CCIs from January 1990 to December 2012. In this way, the number of CCI in each subset is about 260, which is sufficient to make predictions in the proposed model when comparing with 295 CCIs in the universal set. What is more, the forecasted months, including December 2011, December 2012 and December 2013, are close to recent months, which is valid to model the current situations. Table 4 presents the errors of the 3 scenarios by using the proposed method. The errors of seasonal ARIMA [14] are also listed in the table. In general, 3 scenarios provide rather accurate estimations since MAPE, MSE and MAE become lower. On average, the errors of the proposed method have reduced almost by half when compared with seasonal ARIMA.
(2) Long-term Trend When referring to long-term trend, it is intended to forecast the following 3 years’ CCI data. The experiments are divided into 3 groups to avoid occasionality. That is to say, three periods of CCI data will be used to forecast the next 36 months’ CCIs. As is also presented in Table 3, Group 1 uses CCIs from January 1990 to December 1998. Group 2 uses CCIs from January 1990 to December 1999. And Group 3 uses CCIs from January 1990 to December 2000. Each group has over half of the universal CCI data set, which is sufficient for the proposed method. And the true values of the forecasted CCIs have been already contained in the universal data set, which provides a basis for comparison. Last but not least, the three periods that differ from the subsets in short-term trend are not very far away from now, which can be considered as reflections of the real situations nowadays. Table 5 lists the forecasting results of the 3 groups by using the proposed method and other referred methods. For convenient comparison, only MAPE is provided in the table. Generally speaking, the proposed method gives good predictions. In the first 12 months, the average MAPE of the proposed method is better than DR-4,D5-5,VAR(12) and Holt-Winter model but worse than ARMA(5,5). However, in 24 months, the average errors of the proposed method are much lower than ARMA(5,5) and VAR(12) even though higher than DR-4 and DR-5. In 36 months, the proposed method is still more accurate than VECM by 0.06%. To give an example, Fig. 11 plots the following three years’ CCIs that are estimated by CCIs from January 1990 to December 1999. It is implied that the blue solid curve (forecasted values) is generally consistent with the red dotted curve (true values). Thus, multi-step-ahead prediction of the proposed method offers relatively correct future tendency. To further demonstrate the predictability, uncertainties in the models are discussed. In Table 5, the results of VECM are rather accurate within 36 months, because it is based on three statistical tests and comparisons of five different VEC models, which means the model deals with uncertainties from other factors that affect CCI. Besides, by handling uncertainties of CCI itself, models such as DR and ARMA are able to provide accurate predictions in a short period. In the proposed method, time series is converted into a visibility graph. Then, the uncertainty of CCI itself is indicated in the probability that a random walker walks from one node to another. Based on the results of a random walk process, the node similarity quantifies the uncertainty. To ensure the acceptable uncertainty, a link is generated on condition that they are the nodes with highest similarity. As a result, the proposed method forecasts the following 36 months’ CCI accurately.
R. Zhang et al. / Physica A 493 (2018) 239–252 249
Table 5 MAPE of long-term trend forecasting (–: Data not provided).
12-month 24-month 36-month
The proposed method Group 1 0.50% 0.82% 1.06% Group 2 0.21% 0.41% 0.60% Group 3 0.61% 0.56% 0.68% Average 0.44% 0.60% 0.78% The referred methods Holt-Winter [17] 0.70% N/A N/A DR-4[18] 0.52% 0.48% – DR-5[18] 0.45% 0.37% – ARMA (5, 5)[17] 0.36% 1.04% – VAR (12)[17] 0.98% 1.01% – VECM [16] – – 0.84%
Fig. 11. The predicted and true CCIs from year 2000 to 2002: data is detailed in Appendix.
4.3. Sensitivity analysis
Generally, the proposed method performs good based on the error measures and comparison results. Above all, according to Dicky and Fuller’s simulations [53,54], 10% can be taken as a threshold. Since the MAPE in Tables 2–5 is well below 10%, it is illustrated that the precisions of both one-step-ahead and multi-step-ahead predictions are acceptable. Moreover, through a wide comparison, the proposed method is proved to improve the accuracy. Although the errors seem to drop a little, it is considerably significant because one of the most important applications of CCI is in capital budgeting phase of the project. Index-based cost estimation is utilized to develop a preliminary estimate for the cost of the project in an early phase. The more-accurate forecast for CCI leads to more-accurate cost estimate for the project. The larger (in dollar value) the project is the greater the error can be in cost estimation as CCI is often applied as a multiplier to adjust the baseline cost estimate. While enhancing the accuracy of forecasting CCI is useful for improving the cost estimation for any project from any size, the impact will be more significant in large projects (mega- or giga-projects). Think of a multibillion-dollar transportation or a power-plant project. Improving the accuracy of cost estimation by 0.01% translates into several hundred thousand savings for the project.
In addition, the proposed method has some advantages. On one hand, the proposed method provides more simple and convenient implementations, which can save more time for preparations. Only two CCI data is able to make predictions in the proposed method, as is shown in Section 4.1. And only two important steps, i.e. visibility graph algorithm and similarity calculation, are involved in the proposed method. Although simple moving average (SMA) and Holt-Winter models are both simple to implement, the error measures are not accurate enough as is shown in Tables 2 and 5. The methods, i.e. Seasonal ARIMA, VECM, DR and VAR presented in Tables 4 and 5, are relatively accurate as well. But more tests and coefficients have to be identified in these models, such as augmented Dickey–Fuller (ADF) test, maximum likelihood estimation for AR and MA coefficients. Also compared with Ref. [38], the proposed method in this paper is simplified without judging the threshold value of the similarity to identify the state of links.
On the other hand, this paper forecasts CCI based on a network approach instead of statical or causal methods. That is to say, future CCI is forecasted by link prediction that is based on random walk. Since a walker walks randomly, the relationships among the past time series are included in the probability matrix P that is used to compute similarity. Additionally, potential links are established according to the similarity, which represents the connections between the past
250 R. Zhang et al. / Physica A 493 (2018) 239–252
Fig. 12. Forecasting based on different nodes: Forecasted 1 represents CCI forecasted by computing similarities based on the last node and Forecasted 2 represents CCI forecasted by the proposed method.
nodes and the future node. In the proposed method, the future CCI is forecasted by the nodes with highest similarities that is useful for accuracy. Thereby, the nodes with lower similarities that may cause bad estimations are excluded. What is more, this paper provides different strategies for one-step-ahead and multi-step-ahead prediction accordingly. As is implied in Section 3.1, the similarities between the last node and other nodes are calculated for one-step-ahead prediction. Yet Section 3.2 indicates that the future node and other nodes are used to calculate similarities. This is mainly because forecasted CCIs will be added to the training data set, which will make the last node not accurate in multi-step-ahead prediction. To be more specific, Fig. 12 shows multi-step-ahead-predictions by using CCIs from January to May in 2006. As can seen from the three curves, CCI forecasted by similarity based on the last node is less accurate than CCI forecasted by the proposed method. The newly-generated link in the proposed method basically incorporates the future node together with past nodes. In this way, the similarity between the past nodes and future node is computed and the error is reduced consequently. Finally, the similarity, which is an important index in the proposed method, is related to node degree. As a vital property of complex networks, degree not only characterizes the relationships among the nodes but also represents resources that a node has. According to references, the node degree influences connectivity [55], dynamics [56] and evolution [57] of networks. Also compared with Ref. [19], even though the authors make predictions based on a network model, that is called a back- propagation neural-network model, the results turn out to be worse that ES and linear regression. However, the proposed method based on networks constructed by visibility graph can predict more accurately than ES and DR according to Table 5. In a word, the proposed method is combined with both time series and networks by utilizing similarity.
Nevertheless, there also exist some drawbacks. It is unavoidable that inaccurate estimation will take place due to unfavorable changes in CCI. As is shown in Figs. 10 and 11, both the purple and green rectangles indicate that the forecasted value deviates noticeably from the true value when big fluctuations happen to CCI. Note that CCI is effected by crude oil price, consumer price index and so on, more essential factors should be taken into account so as to obtain more accurate CCI in special months. Secondly, the range of the accurate predictions is limited. As is presented in Table 5, the errors turn out to increase along with more CCIs forecasted. This is because only uncertainties of CCI itself are included, instead of considering the uncertainties from other factors affecting CCI such as crude oil price. Besides, seasonality is not involved in the proposed method, which may be related to the ability of long term predictions. By taking uncertainties and seasonality into account, a more robust solution should be designed to overcome the limited forecasting range so that a longer term ahead can be predicted.
5. Conclusion
It is still problematic to make accurate enough predictions of CCI due to the abnormal fluctuations during the growing tendency. Since more accurate CCI predictions are crucial to construction cost, in this paper, a novel method is proposed to avoid risks in forecasting CCI. Specifically, a CCI time series is converted into a network based on visibility graph theory at first. Then future data is forecasted based on link prediction, in which the prior nodes with highest similarity play an important role in both one-step-ahead and multi-step-ahead predictions.
Besides, the experimental results manifest that the proposed method is able to make relatively accurate predictions of CCI. For both one-step-ahead and multi-step-ahead prediction, the errors have been declined when compared with Holt- winter, seasonal ARIMA and dynamic regression model. Besides, the proposed method is more effective since probability and network are utilized instead of complex causality tests and model coefficients in comparison with models of VAR and VECM. It is confirmed that the proposed method will be beneficial to cost estimators for contractors in preparing more accurate bids and planning more rational investments.
R. Zhang et al. / Physica A 493 (2018) 239–252 251
It goes without doubt that even one percentage error would result in terrible consequence and individuals or organiza- tions might suffer from huge capital loss. The proposed model does not perform well in some aspects. For example, when more CCIs need to be forecasted in multi-step-ahead prediction, the errors tend to increase. In addition, the proposed method fail to account for seasonality. Hence, our future work will attempt to improve the forecasting model by 1) taking directions of links into consideration, which includes more factors that effect CCI, to obtain more accurate predictions; 2) reconsidering the visibility graph algorithm by adding the seasonality so that properties of time series can be explained well; 3) establishing multi-layer visibility graphs to address uncertainties from more factors.
Acknowledgments
The authors greatly appreciate the reviews’ suggestions and the editor’s encouragement. The work is partially supported by National Natural Science Foundation of China (Grant Nos. 61573290, 61503237).
Appendix. Data used in Fig. 11
References
[1] ENR, Engineering News-Record, http://enr.construction.com/economics/, 2011. [2] C.N. Pace, F. Vajdos, L. Fee, G. Grimsley, T. Gray, How to measure and predict the molar absorption coefficient of a protein, Protein Sci. 4 (11) (1995)
2411–2423. [3] P.S. Kamath, R.H. Wiesner, M. Malinchoc, W. Kremers, T.M. Therneau, C.L. Kosberg, G. D’Amico, E.R. Dickson, W. Kim, A model to predict survival in
patients with end-stage liver disease, Hepatology 33 (2) (2001) 464–470. [4] A.E. BozorgMagham, S.D. Ross, D.G. Schmale, Real-time prediction of atmospheric Lagrangian coherent structures based on forecast data: An
application and error analysis, Physica D 258 (2013) 47–60. [5] R. Taylor, P. Bowen, Building price-level forecasting: An examination of techniques and applications, Construct. Manage. Econom. 5 (1) (1987) 21–44. [6] S.E. Said, D.A. Dickey, Testing for unit roots in autoregressive-moving average models of unknown order, Biometrika 71 (3) (1984) 599–607. [7] B. Pfaff, Analysis of Integrated and Cointegrated Time Series with R, Springer, 2008.
252 R. Zhang et al. / Physica A 493 (2018) 239–252
[8] R.G. Brown, Smoothing, Forecasting and Prediction of Discrete Time Series, Courier Dover Publications, 2004. [9] G.P. Zhang, M. Qi, Neural network forecasting for seasonal and trend time series, Eur. J. Oper. Res. 160 (2) (2005) 501–514.
[10] Y.-H. Lin, P.-C. Lee, Novel high-precision grey forecasting model, Automat. Construct. 16 (6) (2007) 771–777. [11] H.-l. Yip, H. Fan, Y.-h. Chiang, Predicting the maintenance cost of construction equipment: Comparison between general regression neural network
and Box–Jenkins time series models, Autom. Constr. 38 (2014) 30–38. [12] H. Zheng, Y. Deng, Y. Hu, Fuzzy evidential influence diagram and its evaluation algorithm, Knowl.-Based Syst. 131 (2017) 28–45. [13] X. Zhou, X. Deng, Y. Deng, S. Mahadevan, Dependence assessment in human reliability analysis based on D numbers and AHP, Nucl. Eng. Des. 313
(2017) 243–252. [14] B. Ashuri, J. Lu, Time series analysis of ENR construction cost index, J. Construct. Eng. Manage. 136 (11) (2010) 1227–1237. [15] B. Ashuri, S.M. Shahandashti, J. Lu, Empirical tests for identifying leading indicators of ENR Construction Cost Index, Construct. Manage. Econom. 30 (11)
(2012) 917–927. [16] S. Shahandashti, B. Ashuri, Forecasting engineering news-record construction cost index using multivariate time series models, J. Construct. Eng.
Manage. 139 (9) (2013) 1237–1243. [17] S. Hwang, Time series models for forecasting construction costs using time series indexes, J. Construct. Eng. Manage. 137 (9) (2011) 656–662. [18] S. Hwang, Dynamic regression models for prediction of construction costs, J. Construct. Eng. Manage. 135 (5) (2009) 360–367. [19] T.P. Williams, Predicting changes in construction cost indexes using neural networks, J. Construct. Eng. Manage. 120 (2) (1994) 306–320. [20] S. Hwang, L.Y. Liu, Contemporaneous time series and forecasting methodologies for predicting short-term productivity, J. Construct. Eng. Manage.
136 (9) (2009) 1047–1055. [21] D.J. Watts, S.H. Strogatz, Collective dynamics of small-worldnetworks, Nature 393 (6684) (1998) 440–442. [22] Y. Dong, J. Wang, F. Chen, Y. Hu, Y. Deng, Location of facility based on simulated annealing and ‘‘ZKW algorithms, Math. Probl. Eng. 2017 (2017).
http://dx.doi.org/10.1155/2017/4628501. Article ID 4628501. [23] Q. Zhang, M. Li, Y. Deng, Measure the structure similarity of nodes in complex networks based on relative entropy, Physica A 491 (2018) 749–763. [24] Z. Wang, L. Wang, A. Szolnoki, M. Perc, Evolutionary games on multilayer networks: A colloquium, Eur. Phys. J. B 88 (5) (2015) 1–15. [25] Z. Wang, D.-W. Zhao, L. Wang, G.-Q. Sun, Z. Jin, Immunity of multiplex networks via acquaintance vaccination, Europhys. Lett. 112 (4) (2015) 48002. [26] J. Ludescher, A. Gozolchiani, M.I. Bogachev, A. Bunde, S. Havlin, H.J. Schellnhuber, Improved el niño forecasting by cooperativity detection, Proc. Natl.
Acad. Sci. 110 (29) (2013) 11742–11745. [27] J. Ludescher, A. Gozolchiani, M.I. Bogachev, A. Bunde, S. Havlin, H.J. Schellnhuber, Very early warning of next el niño, Proc. Natl. Acad. Sci. 111 (6) (2014)
2064–2066. [28] L. Lacasa, B. Luque, F. Ballesteros, J. Luque, J.C. Nuño, From time series to complex networks: The visibility graph, Proc. Natl. Acad. Sci. 105 (13) (2008)
4972–4975. [29] X. Lan, H. Mo, S. Chen, Q. Liu, Y. Deng, Fast transformation from time series to visibility graphs, Chaos 25 (8) (2015) 083105. [30] S. Chen, Y. Hu, S. Mahadevan, Y. Deng, A visibility graph averaging aggregation operator, Physica A 403 (2014) 1–12. [31] H. Wang, H. Mo, R. Sadiq, Y. Hu, Y. Deng, Ordered visibility graph weighted averaging aggregation operator: A methodology based on network analysis,
Comput. Ind. Eng. 88 (2015) 181–190. [32] H. Mo, Y. Deng, A new aggregating operator in linguistic decision making based on D numbers, Internat. J. Uncertain. Fuzziness Knowledge-Based
Systems 24 (6) (2016) 831–846. [33] M. Li, Q. Zhang, Y. Deng, A new probability transformation based on the ordered visibility graph, Int. J. Intell. Syst. 31 (1) (2016) 44–67. [34] D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks, J. Am. Soc. Inf. Sci. Technol. 58 (7) (2007) 1019–1031. [35] R.N. Lichtenwalter, J.T. Lussier, N.V. Chawla, New perspectives and methods in link prediction, in: Proceedings of the 16th ACM SIGKDD international
conference on Knowledge discovery and data mining, ACM, 2010, pp. 243–252. [36] W. Liu, L. Lü, Link prediction based on local random walk, Europhys. Lett. 89 (5) (2010) 58007. [37] L. Lü, T. Zhou, Link prediction in complex networks: A survey, Physica A 390 (6) (2011) 1150–1170. [38] S. Chen, X. Lan, Y. Hu, Q. Liu, Y. Deng, The time series forecasting: From the aspect of network, arXiv preprint arXiv:1403.1713, 2014. [39] B. Luque, L. Lacasa, F. Ballesteros, J. Luque, Horizontal visibility graphs: Exact results for random time series, Phys. Rev. E 80 (4) (2009) 046103. [40] Y. Yang, J. Wang, H. Yang, J. Mang, Visibility graph approach to exchange rate series, Physica A 388 (20) (2009) 4431–4437. [41] L. Lacasa, B. Luque, J. Luque, J.C. Nuno, The visibility graph: A new method for estimating the Hurst exponent of fractional Brownian motion, Europhys.
Lett. 86 (3) (2009) 30001. [42] C. Liu, W.-X. Zhou, W.-K. Yuan, Statistical properties of visibility graph of energy dissipation rates in three-dimensional fully developed turbulence,
Physica A 389 (13) (2010) 2675–2681. [43] M. Ahmadlou, H. Adeli, Visibility graph similarity: A new measure of generalized synchronization in coupled dynamic systems, Physica D 241 (4)
(2012) 326–332. [44] S. Scellato, A. Noulas, C. Mascolo, Exploiting place features in link prediction on location-based social networks, in: Proceedings of the 17th ACM
SIGKDD international conference on Knowledge discovery and data mining, ACM, 2011, pp. 1046–1054. [45] M. Al Hasan, M.J. Zaki, A survey of link prediction in social networks, in: Social Network Data Analytics, Springer, 2011, pp. 243–275. [46] K. Pearson, The problem of the random walk, Nature 72 (1865) (1905) 294. [47] D.G. Kirikos, Forecasting exchange rates out of sample: random walk vs markov switching regimes, Appl. Econom. Lett. 7 (2) (2000) 133–136. [48] F. Wang, D.P. Landau, Efficient, multiple-range random walk algorithm to calculate the density of states, Phys. Rev. Lett. 86 (10) (2001) 2050. [49] L. Kilian, M.P. Taylor, Why is it so difficult to beat the random walk forecast of exchange rates?, J. Int. Econ. 60 (1) (2003) 85–107. [50] J.S. Armstrong, T.S. Overton, Estimating nonresponse bias in mail surveys, J. Marketing Res. (1977) 396–402. [51] A.A. Weiss, A. Andersen, Estimating time series models using the relevant forecast evaluation criterion, J. R. Statist. Soc. Ser. A (General) (1984) 484–487. [52] F.-M. Tseng, H.-C. Yu, G.-H. Tzeng, Combining neural network model with seasonal time series ARIMA model, Technol. Forecast. Soc. Change 69 (1)
(2002) 71–87. [53] D.A. Dickey, W.A. Fuller, Distribution of the estimators for autoregressive time series with a unit root, J. Am. Statist. Assoc. 74 (366a) (1979) 427–431. [54] J.M. Wong, A.P. Chan, Y. Chiang, Time series forecasts of the construction labour market in Hong Kong: the Box-Jenkins approach, Construct. Manage.
Econom. 23 (9) (2005) 979–991. [55] C. Bettstetter, On the minimum node degree and connectivity of a wireless multihop network, in: Proceedings of the 3rd ACM international symposium
on Mobile ad hoc networking & computing, ACM, 2002, pp. 80–91. [56] K. Suchecki, V.M. Eguíluz, M. San Miguel, Voter model dynamics in complex networks: Role of dimensionality, disorder, and degree distribution, Phys.
Rev. E 72 (3) (2005) 036132. [57] Z. Wang, L. Wang, M. Perc, Degree mixing in multilayer networks impedes the evolution of cooperation, Phys. Rev. E 89 (5) (2014) 052813.
- Forecasting Construction Cost Index based on visibility graph: A network approach
- Introduction
- Preliminaries
- Visibility graph
- Link prediction
- Proposed method
- One-step-ahead prediction
- Multi-step-ahead prediction
- Discussions
- Result and analysis
- Predictability of one-step-ahead prediction
- Predictability of multi-step-ahead prediction
- Sensitivity analysis
- Conclusion
- Acknowledgments
- Data used in Fig. 11
- References