cost forecasting

profileMaddy007
Article2.pdf

Automated Time-Series Cost Forecasting System for Construction Materials

Sungjoo Hwang1; Moonseo Park, M.ASCE2; Hyun-Soo Lee, M.ASCE3; and Hyunsoo Kim4

Abstract: As large-scale building projects increase in frequency, their construction costs become a matter of great concern, especially because of their lengthy construction periods. In particular, recent volatile fluctuations of construction material prices have fueled problems like cost forecasting. Many researchers try to accurately estimate cost escalations, but price forecasting for numerous construction materials requires a simplified and automated process. The research in this paper develops an automated time-series material cost forecasting (ATMF) system including both autoselected procedures for determining a best-fitting model and an autoextracting module for forecasting values using the Box-Jenkins approach. If the modeling process is simplified and iterative arbitrary decisions for the modeler eliminated, each future prices of a large number of materials can be forecast differently. Thus, the ATMF system can be utilized for predicting future trends in construction material costs. Further, an out-of-sample forecast applying several material price data confirms that this system can be effectively applied to material cost estimation at a more detailed level in object-based cost planning. The proposed system can thus help decision makers in the construction industry deal with changes in economic conditions and design by estimating cost escalations caused by volatile factors such as inflation. DOI: 10.1061/(ASCE)CO.1943-7862.0000536. © 2012 American Society of Civil Engineers.

CE Database subject headings: Auto-regressive moving-average models; Construction costs; Predictions; Time series analysis; Construction materials.

Author keywords: Autoregressive moving-average models; Construction costs; Pricing; Predictions; Time-series analysis.

Introduction

Large-scale construction projects have recently increased in number for residential, commercial, and government facilities worldwide. Examples of this trend are the numerous high-rise buildings (over 100 stories) being constructed or planned as urban landmarks. In these circumstances, construction costs for large- scale building projects have been a matter of great concern because of their significant cost implications and frequent design changes during their lengthy construction periods. Over the long time span from project startup to completion, many factors affect the final project cost (Shane et al. 2009). These volatile factors, such as re- source prices, can lead to under- or overestimations of the total project cost, as resource prices vary because of changes in demand, market conditions, and macroeconomic conditions (Williams 1994). For example, structural steel prices tripled from 2001 to 2008 in Korea, according to recent global economic reports [Construction Association of Korea (CAK) 2008]. Therefore,

additional material costs caused by inflation can decrease general contractors’ profitability over the course of 5 or more years for most large-scale projects, which are generally awarded with a lump-sum contract. As material costs represent approximately one-fourth of the total project cost, an accurate prediction of raw material prices is important for an accurate prediction of the total cost.

To solve these problems, many researchers have attempted to accurately estimate cost escalations in construction projects with a focus on total costs or construction cost index using estimating techniques such as time-series analysis, artificial neural networks, or other probabilistic methods. Time-series analysis, in particular, has been widely used because it represents the time-lagged relation- ship of correlated observations among both single and interrelated multiple series (Hwang and Liu 2010). Despite such strengths, however, two main problems are associated with applying this method to forecasting material cost escalations. First, numerous materials are required for construction projects. Because prices of various materials increase or decrease at different rates, total material cost escalation should be the sum of each material’s price escalation. Second, the complicated and iterative procedures of most forecasting methods require significant time and effort both to determine suitable models and to forecast price escalations for each material required. For these reasons, a simplified and auto- mated forecasting system is desirable for material cost estimation on large-scale construction projects.

This research develops an automated time-series system to easily determine a suitable forecasting model for a corresponding material. This system includes both autoselected procedures for determining a best-fitting forecasting model and an autoextracting module for forecast values.

According to estimators in construction companies, the material price trends to be forecast have the following characteristics: (1) no seasonal variations can be found in most material price time-series data; (2) material prices tend to remain steady even if they increase

1Ph.D. Student, Dept. of Architecture and Architectural Engineering, Seoul National Univ., 599 Gwanak-ro, Seoul, Korea. E-mail: nkkt14@ snu.ac.kr

2Professor, Dept. of Architecture and Architectural Engineering, Seoul National Univ., 599 Gwanak-ro, Seoul, Korea (corresponding author). E-mail: [email protected]

3Professor, Dept. of Architecture and Architectural Engineering, Seoul National Univ., 599 Gwanak-ro, Seoul, Korea. E-mail: [email protected]

4Ph.D. Student, Dept. of Architecture and Architectural Engineering, Seoul National Univ., 599 Gwanak-ro, Seoul, Korea. E-mail: verserk13@ naver.com

Note. This manuscript was submitted on September 2, 2011; approved on January 31, 2012; published online on February 2, 2012. Discussion period open until April 1, 2013; separate discussions must be submitted for individual papers. This paper is part of the Journal of Construction Engineering and Management, Vol. 138, No. 11, November 1, 2012. © ASCE, ISSN 0733-9364/2012/11-1259-1269/$25.00.

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012 / 1259

once in a recessionary period; (3) because numerous factors affect material prices, extensive data collection is required to make a single prediction (Williams 1994). To check the nonseasonality of material price data, this research analyzes the plots of the autocorrelation function (ACF) and partial autocorrelation function (PACF) with regard to three main kinds of construction material: high-tensile rebar (SD400, D51, 15.9 kg=m), H-type steel beam (H700, B300, 185 kg=m), and ready-mixed concrete (30 Mpa, 15 cm, 40 cm). Fig. 1 shows the ACF and PACF of seasonal-transformed price data of the three materials. When there is seasonality in data sets, signifi- cant autocorrelation can be found at all high time lags except for the first or second time lag in the PACF. According to these plots, which have no significant seasonal autocorrelation, the nonseasonality of the material price data can be identified.

With a focus on nonseasonality and trends, this research applies the autoregressive and moving average (ARIMA) modeling method suggested by Box and Jenkins (1994), one of the univariate time- series modeling methods. Although various economic variables are believed to affect trends in material prices, multivariate approaches are not always feasible for quantifying the effect of various factors or representing relationships mathematically. In addition, future values of some factors are not available but must be estimated (Hwang 2010). On the other hand, the univariate time-series ap- proach requires an examination of the relationship based solely on past behavior (Wong et al. 2005). Despite the simplicity of this analysis, the ARIMA method often produces the most accurate forecasting models for any set of data and employs a more system- atic approach to building, analyzing, and forecasting time-series models by considering the trends and cyclic and seasonal elements of past data (Lu and AbouRizk 2009). When there are both non- stationary behavior and seasonality in time-series data, seasonal

ARIMA (SARIMA) can be applied by conducting additional seasonal differencing to remove seasonality. With material price data that have nonseasonality, however, this process for seasonality analysis can be omitted. The ARIMA model can thus be effectively applied to develop a simplified and automated forecasting system specialized for a large number of construction materials.

The paper begins with a study of existing cost-forecasting methods and time-series forecasting. With a focus on possibilities and limitations, an automated time-series forecasting system for construction materials is developed. Using an out-of-sample forecast on several structural construction materials, the proposed system is then evaluated. Finally, application methods of the system are outlined with conclusions.

Literature Review

Research on Cost Escalations

Many studies have focused on the rapidly changing construction material market and have attempted to address cost escalation factors to make cost planning more feasible. The main issues here are identifying escalation factors and estimating project costs accurately and simply. Ranasinghe (1996), for example, presented a simplified model for total project cost by considering the effects of inflation. Akpan and Igwe (2001) developed a suitable model for the evaluation of cost overruns due to inflation, governmental pol- icies, and delay during project execution. Trost and Oberlender (2003) presented a mathematical model for evaluating the accuracy of early estimates using factor analysis and multivariate regression analysis. Touran (2003) proposed a probabilistic model according to Poisson processes for the calculation of project cost contingency. Sonmez (2008) developed an integrated approach to conceptual cost estimation including the advantages of parametric and prob- abilistic estimating techniques such as regression models. Finally, Shane et al. (2009) categorized individual cost increase factors to assess future total project cost.

Although these models are useful for addressing cost escalation factors and preliminary estimation in the early design phase, some restrictions exist for time-varying variables and for reflecting differ- ent time lags between influence factors. Because much time-related data are dependent or have an autocorrelation in reality (Lu and AbouRizk 2009), one way to overcome these limitations is to apply time-related approaches to predicting trends in material prices.

Time-Series Models for Cost Estimating

In an attempt to solve time-related problems in the aforementioned methods, time-series approaches, which determine future trends of a certain variable on the basis of past values of itself and other escalation factors, have been applied to cost estimation in construc- tion projects. For example, Fellows (1991) provided reliable forecasts of tender prices, building costs, and the effects of inflation on building projects using a time-series model. Williams (1994) illustrated the difficulties of applying neural networks to predict changes in construction cost indexes. Akintoye et al. (1998) iden- tified leading indicators of construction cost oscillations in the United Kingdom using a time-series approach. Ng et al. (2004) out- lined the procedures for integrating regression analysis and time- series models to develop a tender price index for Hong Kong construction projects. Wong et al. (2005) applied a time-series projection approach in the labor market of the Hong Kong con- struction industry using the Box-Jenkins model. Hwang (2009) pre- sented a method for predicting changes in construction costs due to economic conditions in the market using a dynamic regression

Fig. 1. Seasonality analysis of material price data

1260 / JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012

model. Finally, Ashuri and Lu (2010) developed a time-series model that determines trends based on past values and correspond- ing errors and provides a more accurate prediction of a construction cost index.

These time-series models provide systematic and time-related approaches to forecasting trends. That is, it is possible to make use- ful projections based on historical patterns (Wong et al. 2005). However, any research that predicts weighted aggregate indexes of prices or total cost will be limited by the necessary inclusion of detailed information required by construction managers, particu- larly in recent object-based cost-planning practices, even though it is useful for owners and bidders. Because existing time-series fore- casting approaches also involve complicated iterative procedures and arbitrary decisions, a modeler must spend considerable time and effort on forecasting different future trends in numerous types of construction material. Also, as new data are acquired during lengthy project durations, the model must be rebuilt, requiring additional expenditures of time and effort. Lu and AbouRizk (2009) thus tried to develop an integrated, robust, and automated Box-Jenkins modeling tool that could be applied to the capital planning of infrastructure systems.

To make the research in this paper applicable to detailed and updatable material cost estimating, an automated forecasting sys- tem is developed on the basis of both the ARIMA modeling process and a simplified forecasting procedure proposed by Lu and Abou Rizk. The suggested ATMF system improves data input and output processes and the process of determining suitable models compared to previous research. Data usability is also supplemented so that the system can be applied to both current data-based management tools (mostly performed in Microsoft Excel-based platforms) and object- based parametric estimating tools (such as building information modeling). Particularly because construction management is per- formed mostly via data-model-based computing software, data us- ability and compatibility should be a very important consideration. Using Microsoft Excel 2007 and its Visual Basic for Applications (VBA), this system can be further simplified and made more suitable for related cost-estimating software.

Time-Series Analysis

Time-Series Models and the Box-Jenkins Approach

Time-series data sets are a sequence of data points, typically mea- sured at successive times and spaced at uniform time intervals. A time-series model determines trends based on past values and corresponding errors. Since the model requires only the historical data of the forecast variable, it is widely used to develop predictive models (Ashuri and Lu 2010). Among these models, the Box- Jenkins approach—a procedure suggested by Box and Jenkins (1994) for carrying out stochastic time-series modeling—is useful for univariate time-series forecasting. The Box-Jenkins approach is also called the autoregressive integrated moving average (ARIMA) model, which considers the underlying trends, cyclic and seasonal elements, and the particular repetitive continuing patterns exhibited by past data (Ng et al. 2004). The ARIMA model takes into con- sideration any particular repetitive or continuing pattern exhibited in the historical trend (Fan et al. 2010). The autoregressive (AR) model estimates the stochastic process underlying a time series where its values exhibit a nonzero autocorrelation, while the mov- ing average (MA) model estimates the process where the current value is related to the random errors from previous time periods (Ng et al. 2004). In other words, the AR model is a relational equa- tion between the current value and its past values, and the MA

model represents a relational equation between the current value and its past forecast errors. The ARIMA model, including both the AR and MA models, calculates current value on the basis of both past values and past forecast errors.

Existing Analysis Procedures of ARIMA Models

The Box-Jenkins approach has three stages: (1) tentative model identification, (2) parameter estimation, and (3) diagnostic check- ing. Because the ARIMA model is applied to analyze stationary time-series data, first it must be determined whether or not the data are stationary, in terms of both mean and variance (Bowerman et al. 2005). In the tentative model identification stage, therefore, station- ary time-series data are judged and nonstationary data are trans- formed into stationary data by applying normal differencing or logarithmic transformation. Fig. 2 is an example of data transfor- mation using normal differencing. The nonstationary raw price data of high-tensile rebar (SD400, D51, 15.9 kg=m) can be transformed using first- or second-order differencing. Because transformed data show stationarity with regard to their mean, they can be applied to the following modeling stage. Then, to determine tentative ARIMA (p, d, q) models, the order of the AR model (p), the order of the MA model (q), and the order of differencing (d) are determined by analyzing the ACF plot and PACF plot.

In the parameter estimation stage, fitted ARIMA models are identified based on goodness-of-fit (GOF) statistics, and model parameters are estimated. To select the most suitable model, the following GOF criteria can be used: smaller values of root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), Akaike’s information criteria (AIC), and normalized Bayesian information criterion (BIC). These indi- cate a higher GOF for predicting values; the same is indicated by larger values of normalized R-squared. BIC rules are used to select tentative parameters for p and q.

The most suitable model is finally determined by the residual test and an overfitting examination in the diagnostic checking stage. The residual ACF is used to examine the residual series of the tentative models. Residual means the difference between the actual (observed) and fitted values. This means that the closer to zero and the more random the residual, the better the fit (Khosrowshahi and Alani 2003). The Ljung-Box chi-squared test examines the autocorrelation of residuals.

Required Improvement on Application to Material Cost Forecasting

The ARIMA model described previously analyzes trends in histori- cal data, estimates appropriate values for parameters, checks the model’s suitability, and then repeats this process until a best-fitting model is determined (Lu and AbouRizk 2009). In other words, the traditional analysis procedure of the ARIMA model involves a complicated and iterative process with arbitrary decisions made by the modeler, particularly in judging stationarity, in determining parameters of the model, and in selecting a suitable model based on GOF criteria. To solve these problems, Lu and AbouRizk (2009) developed a simplified and integrated analysis tool with the aid of computer technology like MATLAB. Although this research improved existing procedures, problems remained, and so the tool was not applicable to material cost forecasting due to numerous types of material to be forecast and a requirement for detailed information in current object-based estimating practices. Lu and AbouRizk’s system was also developed for the capital planning of infrastructure systems for the purpose of combining the forecast method and simulation techniques, as well as enhancing data exchange with other software within the simulation procedure.

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012 / 1261

Despite its strength, the model requires a direct data input process, several user choices and decisions within the modeling phases, and an output data transmission process. While the system can effec- tively be applied in the case of analyzing macrolevel variables such as price index, its application might be time consuming given detail- level variables such as numerous types of material. Also, because forecast price data should be matched with their input time and quan- tity within the object- and spreadsheet-based database, a spreadsheet- based forecasting system can eliminate the data transmission process.

On the basis of the advantages and disadvantages of previous work, the research in this paper analyzes the required improvement for material cost estimation, as described in Table 1. The proposed ATMF system is developed in an MS Excel environment in order to be compatible with existing estimating software. Excel-based calculation methods, an optimal solution algorithm, a GOF score, and an improved data autoinput method are also suggested to minimize the time, effort, and user choices required to forecast the prices of numerous types of material.

Table 1. Improvement of This Research over Existing Methods

Category Existing methods Lu and AbouRizk’s system (2009) ATMF system

Analysis tool e.g., SPSS, SAS MATLAB Microsoft Excel and VBA Data input Direct input of raw data Direct input of raw data Select material names (data input from

material unit price database) and forecast duration and time

Determination of tentative model

Graphical judgment and unit root test

Input tentative 32 models except logarithmic transformation and

seasonal differencing (p, d, q ¼ 0 to 3)

Input all tentative 48 models (p, d, q ¼ 0 to 2, logarithmic

transformation)

Logarithmic transformation

Graphical judgment and unit root test

Performed by computer program Included in tentative models

Parameter estimation Calculating by, e.g., SPSS and SAS using various estimation

methods

Calculating by programs using maximum likelihood estimation

Calculating by Excel using unconditional least-squares method in

optimal solution algorithm Goodness-of-fit test Modelers’ choice among

various goodness-of-fit criteria RMSE Goodness-of-fit score considering

RMSE, MAE AIC, BIC, and R-square Output Best-fitting model determined

by modeler and forecast value Best-fitting model and forecast

value Best-fitting three model and forecast

values Arbitrary decision Analysis of ACF and PACF,

determination of differencing and logarithmic transformation,

determination of tentative model, goodness-of-fit test

Determination of differencing and logarithmic transformation

None

User choice and run Numerous choices and runs Several choices and runs 3 choices (material name, forecast time, and duration) and 1 run

Fig. 2. Example of data transformation: Rebar (SD400, D51, 15.9 kg=m) (KPI 2011)

1262 / JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012

Automated Time-Series Material Cost Forecasting System

This research develops an automated time-series forecasting system for construction materials. The system focuses on usability for con- struction practitioners, compatibility with existing estimation soft- ware, and simplicity in analysis procedures. This system involves a simplified and automated input module using a price database and autoinput of raw time-series data, a parameter estimation module for all tentative models using an optimal solution algorithm, a GOF calculation module using a GOF score, and a diagnostic checking module.

Model Development

Fig. 3 illustrates the modeling procedures of this paper’s ATMF system as compared to an existing method. As shown in this pro- cedure, all tentative models—including the parameters of the order of the AR model (p), the order of the MA model (q), and the order of differencing (d) from 0 to 2, as well as the existence of logarith- mic transformation—are incorporated into the system. In other words, the number of combinations of p (0 to 2) and q (0 to 2) is 8 (i.e., combination of p ¼ 0 and q ¼ 0 is excluded), and each combination can be transformed into three types of the order of differencing (0 to 2) and two types of logarithmic transformation (exist or not). The total number of tentative models is thus calcu- lated to be 48. Goodness-of-fit and test statistics corresponding to each model allow for examinations of suitability and are automati- cally calculated; this eliminates arbitrary decisions that otherwise would have been made by the modeler.

The ATMF system framework, as described in Fig. 4, shows the detailed modeling procedures. The system is composed of four modules: time-series data input, parameter estimation, GOF calcu- lation, and diagnostic checking. • Time-series data input module

To input raw data of material prices to be forecast in this paper’s system, a user selects required material names and forecast time and duration, whereas in the previous system raw data had to be typed in by hand. To implement the module, this research firstly material unit prices database. Construction materials are categorized by their type and prices of each material according to date are assigned. Material unit price data from December 2000, including structural steels and ready-mixed concrete, are provided by the Korea Price Information (KPI) Corporation. When a user selects required material names, corresponding row time-series data of prices are extracted from a material price database and then automatically entered into the system. In addition, row time-series data are automatically transformed into five types of data by normal dif- ferencing and natural logarithmic transformation: data trans- formed by first-order differencing, second-order differencing, logarithmic transformation, both first-order differencing and logarithmic transformation, and both second-order differencing and logarithmic transformation. Finally, transformed data are input into the system to be utilized for parameter estimation and diagnostic checking for all 48 tentative models.

• Parameter estimation module The parameter estimation module estimates parameters

of all tentative ARIMA models and then tests the significance of the parameters. While the traditional method requires

Fig. 3. Model determination procedures of ATMF system compared to existing methods

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012 / 1263

significant time and effort to estimate the parameters of even a single model, this new system performs parameter estimation of all tentative models simultaneously, with only one auto- matic run, by using optimal solution algorithms. The uncondi- tional least-squared method is utilized in this algorithm with the aid of VBA. Using this method, the system automatically calculates optimum values of parameters of the order of p (φ1 and φ2), the order of q (θ1 and θ2) from 0 to 2, and a constant term included in an estimation equation that minimizes the sum of squared residuals (SSR). The whole ATMF system, including the computing codes of the optimal solution algo- rithm for parameter estimation, can be found on the author’s research website (http://blog.naver.com/nkkt14).

• Goodness-of-fit calculating module The GOF calculating module determines the suitability of

all tentative models by using five GOF criteria: root mean square error (RMSE), mean absolute error (MAE), Akaike’s information criteria (AIC), normalized Bayesian information criterion (BIC), and R-squared values. These are calculated by the following equations, respectively:

RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1

N − k XN 1

ðyt − ŷtÞ2 vuut (1)

MAE ¼ 1 N

XN 1

jyt − ŷtj (2)

AIC ¼ lnðRMSE2Þ þ 2ðp þ qÞ N

(3)

BIC ¼ lnðRMSE2Þ þ k ln N N

(4)

R2 ¼ 1 − Σ N 1 ðyt − ŷtÞ2

ΣN1 ðyt − ȳtÞ2 (5)

where N = total number of data points; yt = actual material price; ŷt = forecasted material price; ȳt = mean of actual material prices; and k = total number of estimated parameters. Because each criterion has a different standard of judgment, the greatest GOF of a single criterion cannot guarantee the best predictability of the model. This research therefore sug- gests a GOF score composed by combining the five criteria. Goodness-of-fit scores are thus calculated using the following equation:

GOF Score ¼ Σ½ABSððGOF − OVÞ=ðOV − WVÞÞ� (6) where GOF = value of GOF of corresponding model; OV = optimal value of 48 tentative models; and WV = worst value of 48 tentative models.

• Diagnostic checking module The diagnostic checking module performs a residual test

with a focus on residuals’ correlations using the Ljung-Box chi-squared test. The larger the Ljung-Box Q-statistic (Q�) is, the smaller the probability of a type 1 error and the larger the autocorrelation of the residual. Q�, which follows the chi-square distribution, is calculated by the following equa- tions (Bowerman et al. 2005):

Q� ¼ NðN þ 2Þ XK k¼1

γ2kðε̂Þ N − k ; γk ¼

ΣN−kt¼1 ðε̂t − ε̄Þðε̂tþ2 − ε̄Þ ΣNt¼1ðε̂t − ε̄Þ2

(7)

where N = total number of data points; ε̂t = forecasted residual; ε̄ = mean of residuals; and k t = total number of estimated parameters. When the probability of Q� is larger than the level of significance (generally 0.05), the null hypoth- esis that correlation coefficients of white noises are all zero

Fig. 4. ATMF system framework

1264 / JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012

[i.e., H0∶ρ1ðεÞ ¼ ρ2ðεÞ ¼ · · · ¼ ρkðεÞ ¼ 0] is adopted. This is because the error term, eðtÞ, should be a white noise process to check whether residual series of tentative models follow the white noise assumption (Ashuri and Lu 2010). The statio- narity and invertibility of the parameters are also examined. The stationary condition of the AR(1) process is jφ1j < 1, and those of the AR(2) process are jφ2j < 1, φ2 þ φ1 < 1, and φ2 − φ1 < 1. The invertibility condition of MA(1) is also jθ1j < 1, and those of the MA(2) process are jθ2j < 1, θ2 þ θ1 < 1, and θ2 − θ1 < 1 (Bowerman et al. 2005). By per- forming the residual test, stationarity test, and invertibility test, any overfitting of the model is examined, and inappro- priate models are excluded from their tentative counterparts. Among filtered appropriate models, the three best-fitting ones, which have high goodness-of-fit scores, are finally selected for the forecasting model. The forecast values that correspond to each best-fitting model are automatically calculated using the following equation:

Ŷtþ1 ¼ IF½LT ¼ }No};IFfd ¼ 0;ŷtþ1;IFðd ¼ 1;Yt þ ŷtþ1;2Yt −Yt−1 þ ŷtþ1Þg;EXP½IFfd ¼ 0;ŷtþ1;IFðd ¼ 1;LnðYtÞ þ ŷtþ1;2LnðYtÞ−LnðYt−1Þþ ŷtþ1Þg�� (8)

ŷtþ1 ¼ φ̂1ðyt − μÞ þ φ̂2ðyt−1 − μÞ − θ̂1εt − θ̂2εt−1 þ μ (9) where Ŷt = forecasted values of row data; ŷt = forecasted values of transformed data μ a constant term; and LT = logarithmic transformation.

System Templates

Fig. 5 represents the main templates of the ATMF system. To pre- dict future trends of unit prices of required materials, a user selects corresponding material names, forecast time, and forecast duration in Fig. 5(a). The time-series data of selected materials are automati- cally input into the left part of Fig. 5(b). When the user clicks the RUN button, three best-fitting models and their GOF scores are determined in Fig. 5(c). Future values predicted by each model are shown in the right part of Fig. 5(b).

The report screen (Fig. 6) presents the modeling process. If the estimated parameter values of AR(1), AR(2), MA(1), MA(2), and a constant term exist, then they are described in Fig. 6(a). Their significant test result [Fig. 6(a)], goodness-of-fit values [Fig. 6(b)], and test results for residual, stationarity, and inverti- bility [Fig. 6(c)] all correspond to the 48 tentative models also presented.

System Evaluation

This research conducts a comparative analysis of the prediction capability of three selected models using an out-of-sample forecast. At this stage, 124 data sets of material prices from December 2000 to March 2011 are employed. To perform an out-of-sample fore- cast, only 112 data sets from December 2000 to March 2010 are included in the modeling phase. The last 12 data sets from April 2010 to March 2011 are used only to analyze the predictive power of the models and to evaluate the system._Three main kinds of con- struction material—high-tensile rebar (SD400, D51, 15.9 kg=m), H-type steel beam (H700, B300, 185 kg=m), and ready-mixed concrete (30 Mpa, 15 cm, 40 cm)—are selected to evaluate the sys- tem. The forecast values and their absolute errors are determined by three best-fitting models (Table 2). The mean absolute percentage

error (MAPE) values of the three best-fitting models, with respect to the three materials, are also presented to compare the predictabil- ity of the three models. The MAPE is calculated using the follow- ing equation:

MAPE ¼ 100 N

XN 1

���� yt − ŷt yt

���� (10)

The generally accepted MAPE of a robust forecast model is 10% (Fan et al. 2010). In this table, the MAPE of the forecast val- ues ranges from 0.97 to 3.50%, and the overall MAPE ranges from 0.52 to 1.85%, which means the model offers significant prediction capability. Also, the higher the goodness-of-fit score a model has, the better prediction capability it has with the smaller MAPE values. However, when there are only slight differences between the GOF scores among determined models, the models’ prediction capabilities need to be checked. For example, the ARIMA (1,1,0)L model with rank 2 (GOF score of 45.88) for the material RMC 30_15_40 has a better prediction capability (MAPE value of 0.97) than the ARIMA (0,1,1)L model with rank 1 (GOF score of 45.88 and MAPE value of 0.99). This is the only case where the goodness-of-fit scores among determined models are much the same. For this reason, this system provides the top three scoring models as well as their forecast values, each GOF value, and test results for residual, stationarity, and invertibility. By comprehen- sively considering all these results as well as conducting predict- ability test, users can make a final decision according to their preferences (e.g., whether to focus on the predictability or reliabil- ity of the determining procedures).

Fig. 5. System templates for data input and output: (a) material infor- mation; (b) time-series and forecasting data; (c) results of run

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012 / 1265

Table 3 shows the results of comparative analysis between the time-series model determined by the ATMF system and the linear regression model in which time period of the data sets is considered the independent variable. The lower MAPE values of the time-series model show that in terms of predictability the ATMF model outperforms other simple models such as linear regression.

Despite the simplified and automated procedures of this system, its considerable estimating accuracy means it can be effectively ap- plied to forecasting future prices of construction materials. Further, the system has the potential to accurately estimate material costs in a rapidly changing economic environment.

Application

The proposed ATMF system can forecast future prices of all materi- als required in construction projects by automated and simplified procedures; thus, it can be easily applied to material cost estimating in object-based estimating environments at a detailed level. Fig. 7 shows the material-cost-forecasting framework after applying trends of material price fluctuations forecast by an ATMF system. For material cost estimation, future values of material prices predicted

by the ATMF system can be useful if combined with other object properties related to scheduling and quantity estimation.

In the object-based model, each object has a unique ID, classi- fied according to the area of the object’s location and installation. The ID also includes various properties encompassing types and sizes of the installed materials (length, area, and volume). This in- formation is imported to the quantity-estimating module connected to the scheduling module. Information of activity start times and material stocked times extracted from this model are added to objects as new properties. With respect to the predicted values of material unit prices, these prices’ properties are extracted from the suggested time-series system and then added to objects corre- sponding to activity time periods when each material is installed. Finally, each object includes properties about material types, quan- tity information, stocked time, and future prices corresponding to the materials’ stocked times.

By using extracted properties including material type, quantity information, stocked time, and future prices, the quantities and times of inputs are classified according to corresponding activity start time. Thus monthly (and weekly or quarterly) material quan- tity inputs can be obtained from the quantity take-off system.

In the cost-estimating module, each monthly quantity is multi- plied by forecasted material prices of equivalent periods, yielding

Fig. 6. System templates for reports of determining procedures: (a) description of estimated parameter values and constant term with significant test reults; (b) goodness-of-fit values; (c) residual, stationarity, and invertibility test results

1266 / JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012

T a b le

2 . F or ec as t V al ue s D et er m in ed

by S el ec te d M od el s

F or ec as t

m od

el

M at er ia l

R eb ar _S

D 40 0_ D 51

S te el

H -B ea m _7 00 _3 00

R M C _3 0_ 15 _4 0

R an k

1 2

3 1

2 3

1 2

3

M od el

A R IM

A (2 ,1 ,2 )

A R IM

A (1 ,1 ,0 )L

A R IM

A (1 ,1 ,0 )

A R IM

A (2 ,1 ,2 )

A R IM

A (1 ,1 ,0 )L

A R IM

A (1 ,1 ,0 )

A R IM

A (0 ,1 ,1 )L

A R IM

A (1 ,1 ,0 )L

A R IM

A (1 ,1 ,1 )L

G O F sc or e

50 47 .9 5

47 .9 2

49 .2 3

49 .2 3

49 .1 5

45 .8 8

45 .8 8

45 .8 2

F or ec as t va lu e

(W on

) [a b so lu te

er ro r (%

)]

A pr

20 10

89 4, 99 7 (2 .6 4)

89 5, 36 3 (2 .6 8)

89 5, 69 8 (2 .7 2)

99 2, 17 4 (0 .2 2)

99 2, 54 2 (0 .2 6)

99 1, 56 6 (0 .1 6)

63 ,8 75

(0 .1 5)

63 ,8 58

(0 .1 2)

63 ,8 95

(0 .1 8)

M ay

20 10

91 1, 50 7 (1 .1 4)

90 9, 93 9 (1 .3 1)

91 0, 46 8 (1 .2 5)

99 5, 44 0 (4 .2 8)

99 6, 66 1 (4 .1 7)

99 4, 19 2 (4 .4 0)

63 ,9 72

(0 .3 0)

63 ,9 53

(0 .2 7)

63 ,9 94

(0 .3 4)

Ju n 20 10

91 7, 38 9 (4 .6 4)

91 9, 99 3 (4 .3 7)

92 0, 34 9 (4 .3 3)

99 9, 40 0 (4 .8 2)

1, 00 1, 76 8 (4 .5 9)

99 7, 53 7 (5 .0 0)

64 ,0 70

(0 .4 5)

64 ,0 52

(0 .4 3)

64 ,0 91

(0 .4 9)

Ju l 20 10

92 4, 30 7 (2 .4 7)

92 7, 75 2 (2 .8 6)

92 7, 55 4 (2 .8 3)

1, 00 3, 69 3 (0 .3 7)

1, 00 7, 50 2 (0 .7 5)

1, 00 1, 36 8 (0 .1 4)

64 ,1 67

(0 .6 1)

64 ,1 51

(0 .5 8)

64 ,1 87

(0 .6 4)

A ug

20 10

92 7, 85 4 (2 .8 7)

93 4, 36 4 (3 .5 9)

93 3, 29 2 (3 .4 7)

1, 00 8, 17 8 (2 .8 8)

1, 01 3, 64 0 (3 .4 3)

1, 00 5, 52 9 (2 .6 0)

64 ,2 65

(0 .7 6)

64 ,2 51

(0 .7 4)

64 ,2 84

(0 .7 9)

S ep

20 10

93 2, 62 8 (3 .4 0)

94 0, 41 0 (4 .2 6)

93 8, 22 7 (4 .0 2)

1, 01 2, 76 1 (3 .3 4)

1, 02 0, 04 5 (4 .0 9)

1, 00 9, 91 2 (3 .0 5)

64 ,3 63

(0 .9 1)

64 ,3 52

(0 .9 0)

64 ,3 81

(0 .9 4)

O ct

20 10

93 6, 08 6 (3 .5 5)

94 6, 18 6 (4 .6 7)

94 2, 72 2 (4 .2 8)

1, 01 7, 39 8 (0 .7 3)

1, 02 6, 63 3 (1 .6 5)

1, 01 4, 44 7 (0 .4 4)

64 ,4 61

(1 .0 7)

64 ,4 52

(1 .0 5)

64 ,4 77

(1 .0 9)

N ov

20 10

94 0, 26 2 (1 .7 6)

95 1, 84 1 (3 .0 1)

94 6, 97 7 (2 .4 9)

1, 02 2, 06 3 (1 .7 2)

1, 03 3, 35 2 (0 .6 4)

1, 01 9, 08 4 (2 .0 1)

64 ,5 59

(1 .2 2)

64 ,5 53

(1 .2 1)

64 ,5 74

(1 .2 5)

D ec

20 10

94 3, 85 9 (1 .6 0)

95 7, 45 2 (3 .0 6)

95 1, 10 0 (2 .3 8)

1, 02 6, 74 4 (0 .3 2)

1, 04 0, 16 8 (0 .9 9)

1, 02 3, 79 1 (0 .6 0)

64 ,6 57

(1 .3 8)

64 ,6 53

(1 .3 7)

64 ,6 71

(1 .4 0)

Ja n 20 11

94 7, 82 3 (6 .0 2)

96 3, 05 6 (7 .7 2)

95 5, 15 0 (6 .8 4)

1, 03 1, 43 3 (2 .1 2)

1, 04 7, 06 4 (3 .6 7)

1, 02 8, 54 4 (1 .8 4)

64 ,7 55

(1 .5 3)

64 ,7 54

(1 .5 3)

64 ,7 68

(1 .5 5)

F eb

20 11

95 1, 51 8 (0 .8 0)

96 8, 67 3 (2 .6 1)

95 9, 16 1 (1 .6 1)

1, 03 6, 12 7 (5 .8 1)

1, 05 4, 02 6 (4 .1 8)

1, 03 3, 32 9 (6 .0 6)

64 ,8 54

(1 .6 8)

64 ,8 55

(1 .6 9)

64 ,8 66

(1 .7 0)

M ar

20 11

95 5, 39 4 (3 .7 9)

97 4, 31 3 (1 .8 8)

96 3, 15 1 (3 .0 1)

1, 04 0, 82 3 (9 .4 9)

1, 06 1, 04 8 (7 .7 3)

1, 03 8, 13 6 (9 .7 3)

64 ,9 53

(1 .8 4)

64 ,9 57

(1 .8 4)

64 ,9 63

(1 .8 6)

M A P E (%

) 2. 88

3. 5

3. 27

3. 01

3. 01

3. 01

0. 99

0. 97

1. 02

T a b le

3 . C om

pa ra ti ve

A na ly si s of

P re di ct iv e P ow

er w it h S im

pl e M od el s

F or ec as t

m od

el

M at er ia l

R eb ar _S

D 40 0_ D 51

S te el

H -B ea m _7 00

_3 00

R M C _3 0_ 15 _4 0

R an k

1 2

— 1

2 —

1 2

M od el

A R IM

A (2 ,1 ,2 )

A R IM

A (1 ,1 ,0 ) L

L in ea r

re gr es si on

y ¼

β t þ α

β ¼

4 7 2 0 .0

α ¼

4 0 4 5 8 5

A R IM

A (2 ,1 ,2 )

A R IM

A (1 ,1 ,0 ) L

L in ea r

re gr es si on

y ¼

β t þ α

β ¼

6 1 5 8 .9

α ¼

3 6 3 1 2 7

A R IM

A (0 ,1 ,1 ) L

A R IM

A (1 ,1 ,0 ) L

L in ea r

re gr es si on

y ¼

β t þ α

β ¼

3 8 .3 1

α ¼

5 7 4 8 7

G O F sc or e

50 47 .9 5

— 49 .2 3

49 .2 3

— 45 .8 8

45 .8 8

M A P E

(% )

T ot al

da ta

(D ec

20 00 –M

ar 20 11 )

1. 22

1. 73

8. 75

1. 47

1. 83

9. 78

0. 52

0. 54

3. 56

F or ec as t da ta

(A pr

20 10

–M ar

20 11 )

2. 88

3. 50

5. 07

3. 01

3. 01

5. 94

0. 99

0. 97

2. 81

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012 / 1267

total material costs. Total construction costs can also be estimated by applying other cost information such as labor, subcontract, and equipment costs.

On the other hand, new time-series data sets will be updated as the construction progresses. Updating allows for forecasting more accurate estimated values of material prices to modify budgeted costs for residual quantities and work schedules. The cost update module can thus more accurately estimate modified total construc- tion costs by combining the budgeted cost of residual work and the actual cost of work performed. As described previously, the ATMF system can play a salient part in an object-oriented cost-estimating system by offering detailed information on respective material price escalation in a rapidly changing economic environment.

Conclusions

The research in this paper developed an automated time-series fore- casting system for construction materials, including both auto- selected procedures for determining a best-fitting forecasting model and an autoextracting module for forecast values, using Box- Jenkins methodologies and an Excel-based statistical approach. The ATMF system can automatically determine the optimal fore- casting model for corresponding materials and forecast trends of unit prices. This system was evaluated through an out-of-sample forecast with respect to time-series price data of three structural construction materials: rebar, steel beam, and ready-mixed con- crete. The autodetermination procedures involved in this system can be used to predict future trends in prices of a large number of construction materials by simplifying the forecast modeling pro- cess and by eliminating the iterative arbitrary decisions made by the modeler. Construction practitioners can thus minimize the time and effort spent on material cost estimation.

Although this system was evaluated using only material price data in Korea, which do not have seasonality, it can be applied to any geographical region as long as there is no seasonality in the region’s material price data. Although it is hard to identify seasonal fluctuations in most regions, further supplementing the seasonality analysis of the ATMF system could enhance applicabil- ity worldwide.

With the recent volatile economic conditions, advance preparation for material price escalation can help increase the accuracy of cost estimates by being applied to both current data-based manage- ment platforms and object-based cost planning at a detailed level. Therefore, the proposed system can help in dealing with changes in economic conditions and designs by estimating future material prices.

This research attempted to simplify and automate the forecast process for required construction materials. Nevertheless, the de- veloped system can produce large forecast errors if discontinuities occur within the projection time periods. Minor statistical short- comings of the system, such as a unit root test used for statistically determining stationarity, are acknowledged and should be ad- dressed in future research. Lastly, the developed system can make trends reliable only in the short run. In future research, several attempts should be made to reduce prediction errors by develop- ing additional functions in this system and, further, by building multivariate time-series and intervention models that consider vari- ous influence factors and sudden changes in market conditions.

Acknowledgments

This research was supported by a grant (Code No. 09 R&D A01) from Super-Tall Building R&D Project funded by the Ministry of Land, Transport and Maritime Affairs of the Korean government.

Fig. 7. Material cost estimation using ATMF system

1268 / JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012

References

Akintoye, A., Bowen, P., and Hardcastle, C. (1998). “Macro-economic leading indicators of construction contract prices.” Constr. Manage. Econ., 16(2),159–175.

Akpan, E. O., and Igwe, O. (2001). “Methodology for determining price variation in project execution.” J. Constr. Eng. Manage., 127(5), 367–373.

Ashuri, B., and Lu, J. (2010). “Forecasting ENR construction cost index: A time series analysis approach.” Proc., Construction Research Congress 2010, ASCE, Reston, VA, 1345–1355.

Bowerman, B. L., O’Connell, R. T., and Koehler, A. B. (2005). Forecast- ing, time series, and regression: An applied approach, 4th Ed., Duxbury Press, Pacific Grove, CA, 401–590.

Box, G., and Jenkins, G. M. (1994). Time series analysis: Forecasting and control, 3rd Ed., Prentice Hall, Upper Saddle River, NJ.

Construction Association of Korea (CAK). (2008). “Statistics in produc- tion cost of completed construction projects.” ⟨http://www.cak.or.kr⟩ (Dec. 21, 2010).

Fan, R., Ng, T., and Wong, J. (2010). “Reliability of the Box-Jenkins model for forecasting construction demand covering times of economic austerity.” Constr. Manage. Econ., 28(3), 241–254.

Fellows, R. F. (1991). “Escalation management: Forecasting the effects of inflation on building projects.” Constr. Manage. Econ., 9(2), 187–204.

Hwang, S. (2009). “Dynamic regression models for prediction of construc- tion costs.” J. Constr. Eng. Manage., 135(5), 360–367.

Hwang, S. (2010). “Cross-validation of short-term productivity forecasting methodologies.” J. Constr. Eng. Manage., 136(9), 1037–1046.

Hwang, S., and Liu, L. Y. (2010). “Contemporaneous time series and forecasting methodologies for predicting short-term productivity.” J. Constr. Eng. Manage., 136(9), 1047–1055.

Khosrowshahi, F., and Alani, A. M. (2003). “A model for smoothing time- series data in construction.” Constr. Manage. Econ., 21(5), 483–494.

Korea Price Information (KPI). (2011). “Construction material prices services.” ⟨http://www.kpi.or.kr⟩ (May 31, 2011) (in Korean).

Lu, T., and AbouRizk, S. M. (2009). “Automated Box-Jenkins forecasting modelling.” Autom. Constr., 18(5), 547–558.

Ng, S. T., Cheung, S. O., Skitmore, M., and Wong, T. C. (2004). “An in- tegrated regression analysis and time series model for construction ten- der price index forecasting.” Constr. Manage. Econ., 22(5), 483–493.

Ranasinghe, M. (1996). “Total project cost: A simplified model for decision makers.” Constr. Manage. Econ., 14(6), 497–505.

Shane, J. S., Molenaar, K. R., Anderson, S., and Schexnayder, C. (2009). “Construction project cost escalation factors.” J. Manage. Eng., 25(4), 221–229.

Sonmez, R. (2008). “Parametric range estimating of building costs using regression models and bootstrap.” J. Constr. Eng. Manage., 134(12), 1011–1016.

Touran, A. (2003). “Probabilistic model for cost contingency.” J. Constr. Eng. Manage., 129(3), 280–284.

Trost, S., and Oberlender, G. (2003). “Predicting accuracy of early cost estimates using factor analysis and multivariate regression.” J. Constr. Eng. Manage., 129(2), 198–204.

Williams, T. P. (1994). “Predicting changes in construction cost indexes using neural networks.” J. Constr. Eng. Manage., 120(2), 306–320.

Wong, J., Chan, A., and Chiang, Y. (2005). “Time series forecasts of con- struction labour market in Hong Kong: The Box-Jenkins approach.” Constr. Manage. Econ., 23(9), 979–991.

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT © ASCE / NOVEMBER 2012 / 1269

Copyright of Journal of Construction Engineering & Management is the property of American Society of Civil

Engineers and its content may not be copied or emailed to multiple sites or posted to a listserv without the

copyright holder's express written permission. However, users may print, download, or email articles for

individual use.