project management

InvestigationintoExplainableRegressionTreesforConstructionEngineeringApplications.pdf

Home >Business & Finance homework help >Management homework help >project management

Investigation into Explainable Regression Trees for Construction Engineering Applications

Serhii Naumets1 and Ming Lu, M.ASCE2

Abstract: The logic of an artificial intelligence (AI) model derived from machine learning algorithms and domain-specific data is analogous to an expert’s perception of a complex problem. Human insight based on know-how and experience also provides the best clue to verify such analytical models generalized from data. To facilitate the acceptance and implementation of AI by industry professionals, we explored the least complicated form of model that still is sufficient to represent the complexities of real-world problems. This research established a framework to apply the M5P model tree in the context of producing explainable AI for practical applications. The explanatory information derived from M5P (a decision tree with linear regressions at leaf nodes) is instrumental in explaining how the more complicated AI model reasons for the same problem, illuminating the sufficiency of problem definition and data quality, and distinguishing valid submodels from invalid ones in the obtained model tree. A steel fabrication labor cost–estimating case and a concrete strength development case were given for method validation and application demonstration. DOI: 10.1061/(ASCE)CO.1943-7862.0002083. © 2021 American Society of Civil Engineers.

Introduction

The most challenging and crucial undertaking in construction en- gineering and management is planning the execution of an activity in the field within limited budgets while maximizing gains in pro- ductivity and cost-efficiency (Halpin and Riggs 1992). Planning entails the prediction of crew performances in terms of time and cost that is a subject to activity-specific contexts and constraints. Thus, regression-type model development for input–output map- ping based on historical data in a particular application problem is warranted in support of making decisions in construction engineer- ing. Conventional regression techniques and emerging artificial in- telligence (AI), such as artificial neural networks (ANNs), have been applied to generalize hidden patterns and implicit relation- ships from historical data, ultimately resulting in a valid prediction model to assist the planner in the analysis of new cases in the prob- lem domain. In construction engineering applications, in addition to the prediction result that is sufficiently accurate, the decision maker also demands the revelation of an AI model’s reasoning logic in order to cross-check personal experiences and gut feeling. In addition, the historical data used for AI modeling in reality are almost certain to contain noise (i.e., incomplete or inconsistent data collected from the real world due to human errors or system errors).

In practice, the construction plan (i.e., a cost budget for a given scope of work) ultimately is presented to the operations personnel as a baseline to control field execution. To turn over the plan to the operations personnel, the planner (i.e., the estimator) needs to ex- plain and communicate (1) how the plan was derived, i.e., how the

AI model reasoned in relating the input factors to the predicted out- put, (2) how much noise was present in the learning data that could lower the prediction performance of the AI model, and (3) given a specific context of the problem, the trustworthiness of a partic- ular prediction compared with the overall averaged accuracy of the AI model. Therefore, without a doubt, the explainability of an AI model in support of decision making in construction engineer- ing and management is important for its acceptance and successful implementation.

Explainable artificial intelligence (XAI) is formalized in com- puting research in an attempt to develop a second model to explain the precise logic in the problem domain based on learning, while still delivering sufficient predicting accuracy as the primary AI model (Gunning 2016). It is a new AI application paradigm in con- struction engineering and management.

Classification models are used more frequently in application fields such as medicine, justice, social media, and advertisement (Rudin 2019). In the computer vision field, in which AI has found successful applications, explanations of derived AI models usually are presented in the form of saliency maps to identify the parts of an image that significantly impact the image classification. Currently, the majority of XAI frameworks have been developed to enable pat- tern recognition and classification, whereas XAI for regression-type models has yet to be addressed. In the construction field, opportu- nities abound for the use of AI algorithms to address regression-type problems such as predicting the labor cost of steel fabrication or the compressive strength of concrete in curing (which are the two ap- plications addressed in this research). To explore the explainability of AI and to promote its applications in the construction field, we enhanced the model tree made of regressions in order to interpret the performance of each submodel and validate the model’s reason- ing logic.

The steel fabrication labor hour prediction problem was identi- fied jointly with an industry partner as a practical application case, which warranted the implementation of the proposed XAI frame- work. Over the last 3 years, data set preparation and model verifi- cation and validation were conducted through a collaborative research effort engaging the partner company. The research had proven that the enhanced model tree algorithm provided effective

1Ph.D. Candidate, Dept. of Civil and Environmental Engineering, Univ. of Alberta, 116 St. and 85 Ave., Edmonton, AB, Canada T6G 2R3. ORCID: https://orcid.org/0000-0001-8653-0667. Email: [email protected]

2Professor, Dept. of Civil and Environmental Engineering, Univ. of Alberta, 116 St. and 85 Ave., Edmonton, AB, Canada T6G 2R3 (corre- sponding author). ORCID: https://orcid.org/0000-0002-8191-8627. Email: [email protected]

Note. This manuscript was submitted on June 16, 2020; approved on January 22, 2021; published online on May 31, 2021. Discussion period open until October 31, 2021; separate discussions must be submitted for individual papers. This paper is part of the Journal of Construction En- gineering and Management, © ASCE, ISSN 0733-9364.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

https://doi.org/10.1061/(ASCE)CO.1943-7862.0002083

https://orcid.org/0000-0001-8653-0667

mailto:[email protected]

https://orcid.org/0000-0002-8191-8627

mailto:[email protected]

http://crossmark.crossref.org/dialog/?doi=10.1061%2F%28ASCE%29CO.1943-7862.0002083&domain=pdf&date_stamp=2021-05-31

XAI for practical applications such as structural steel labor-cost es- timating in the bidding stage.

To further demonstrate the application of the developed scheme, a second construction engineering problem (concrete strength pre- diction during the curing process) is presented. The data set was taken from the University of California, Irvine machine learning repository (UCI 2020) that is well established for benchmarking AI algorithm performance. The model tree was calibrated by ap- plying the proposed framework to serve as the XAI for interpreting the models previously developed from computing research.

The remainder of this paper is structured as follows. The section “Literature Review” first delves into the mechanisms of four pre- dictive methods and further discusses trends in AI applications. The following section describes performance metrics selected for evalu- ating regressions and AI models. The section “How M5P Works” explains the model tree algorithm in layman terms. The section “Steel Fabrication Labor-Cost Estimating” describes the data collec- tion process for the case study, attribute selection, and the interpret- ability of each evaluated model, and elaborates on the three-color enhancement scheme. The section “Concrete Strength Prediction: Case Study” is provided next, followed by the Conclusion. Appen- dixes I and II contain seven samples of the steel fabrication data set.

Literature Review

Predictive Methods

Rosenblatt (1961) pioneered the research of analytically model- ing the human brain as perceptron, a machine designed for image recognition (the term perceptron can be compared with neurons, i.e., the unit cells of nerves). Since then, researchers in a wide range of scientific fields have adopted this artificial neural network ap- proach for teaching a machine to recognize the output based on a set of inputs. A significant departure from Rosenblatt’s percep- tron occurred when Vapnik and Cortes (1995) combined Vapnik’s optimal hyperplanes developed in 1965 (Vapnik and Kotz 2006) with an ANN design into the concept of the support vector ma- chine (SVM). ANNs and SVMs usually perform better than simpler models such as multiple linear regression or decision trees. For ex- ample, SVM defines categories in a high-dimensional space (simply put, SVM clusters points in unlimited dimensions). In contrast,

ANNs are adept at distinguishing data that are not linearly separable (Kantardzic 2011). Fig. 1 illustrates an ANN function and a SVM function. To a certain degree, attempting to explain how these neural nets reason is analogous to trying to explain the mechanisms of thought process and consciousness in the human brain.

Breiman et al. (1984) developed analytical algorithms of the de- cision tree model for classification and regression (CART). This model acts like an upside-down tree, growing its branches from the root node down to the leaf nodes at the bottom. Each split in a branch represents a numeric or categorical condition. The expansion of the tree ends at leaf nodes (Fig. 2). The interpretability of this model is high, butit has some serious drawbacks. According to Tibshirani and Friedman (2008), “trees have one aspect that prevents them from being the ideal tool for predictive learning, namely inaccuracy.”

As an enhanced version of a decision tree, the random forest (RF) was developed by Ho (1995). This algorithm builds as many random trees as possible. A RF model arbitrarily categorizes data points using decision trees and simple yes/no conditions (Fig. 3). After all trees are grown, each is evaluated using the data reserved for testing. Based on this evaluation, the RF model chooses the most accurate tree as the final solution. The forest is trained on a bootstrapped data set which is of the same size as the original data set but consists of randomly selected samples from the original data set and their duplicates. The duplicates usually account for up to 30% of the training data set, which essentially creates an unnatural data set (Fan and Zhang 2009).

Another parallel endeavor to embellish decision tree had re- sulted in integration with regression algorithms. The M5P model tree was designed by Quinlan (1992) and then enhanced by Wang and Witten (1997). M5P grows a decision tree–like CART; instead of providing one value at a leaf node, it builds a linear regression based on the instances of data that reach the particular node (Fig. 4).

Current AI Trend

Machine learning algorithms for data-driven predictive analytics, in- cluding ANN, SVM, and RF, have been utilized widely by research- ers in the last couple of decades. According to the recent discussion by Emmert-Streib et al. (2020), such statistical models and machine learning methods have been introduced due to the lack of general theoriesoutsideofphysics.Innearlyeveryapplieddomain,historical data are abundant, and big data is emerging due to technological

Fig. 1. Abstract illustration of ANN and SVM: ANN function is the result of learning from training data analogous to nonlinear regression; SVM function clusters data into classes using hyper plane (star—class A, circle—class B).

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

Fig. 2. Abstract illustration of decision tree: Clustering based on splitting attributes and certain criteria, e.g., yes/no in the current example (star—class A, circle—class B, square—class C).

Fig. 3. Abstract illustration of random forest: Clustering based on random splitting attributes and certain criteria, e.g., yes/no in the current example (star—class A, circle—class B, square—class C, triangle—class D).

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

advancements. AI has been applied widely as effective computing means to extract hidden signals and uncover nonobvious patterns from the data, while also making reliable predictions of the outcome expected in unseen cases. Adeli (2001) conducted a review of the journal Computer-Aided Civil and Infrastructure Engineering from 1989 (the first publication on the ANN topic) to 2000 and found over 180 ANN use cases, not counting alternative standalone algorithms such as decision tree or fuzzy logic. A review by Kulkarni et al. (2017) covered over 70 ANN applications in construction manage- mentalone.Althoughscholarskeepwideningtheboundariesofwhat machines can learn, AI applications remain rare in practice. Many of the best-performing methods feature highly complex mathematical algorithms, thus prohibiting a straightforward explanation of the obtained results in simple terms (Emmert-Streib et al. 2020). Nonetheless, for professionals who make high-stakes decisions, these explanations are worth their weight in gold. The user’s trust in a computer program is analogous to trust in a co-worker: if there is no understanding, there is no trust; if there is no trust, there is no cooperation.

Widman and Loparo (1990) pointed out that the credibility of an AI program frequently depends on its ability to explain its conclu- sions. Dhar and Stein (1997) argued that because neural network (NN) algorithms such as the back-propagation NN are nonlinear, high-dimensional functional equations featuring parallel distributed data processing, it is hard to interpret explicitly which parameters cause which behavior in the NN model. Although mathematical and operational methods do exist for the analysis of neural networks, the methods are fairly convoluted and are less than satisfying because of their theoretical assumptions. Dhar and Stein (1997) stated that “unlike most statistical methods, it can be difficult to say, even in general, which variables are significant in what respect.” Research to decipher those nonexplainable AI models has made inroads in spe- cific application domains. For example, Lu et al. (2001) created a tornado-like sensitivity graph that was able to analyze the sensitivity of ANN input parameters and measure their impact on the output. Domain experts could use this visual aid to interpret and validate the ANN model based on their experience and common sense. Ruping (2006) investigated how to interpret a SVM and how to measure the interpretability of the machine learning algorithm itself. Ruping ar- gued that in order for a model to be comprehensible to the user, it must be accurate and efficient so that interpretability does not become a performance bottleneck.

The fact that explainability often is mandatory for an AI model to be of practical value led the United States Defense Advanced Research Projects Agency (DARPA) to initiate a new field called

explainable artificial intelligence (Gunning 2016). The computing research community recently has developed multiple explainability techniques, including model simplification approaches, feature rel- evance estimators, text explanations, local explanations, and model visualizations (Arrieta et al. 2019). Most XAI endeavors have fo- cused on problems and data sets relevant to the areas of sociology and image, text, or sound recognition, or at the corporate level have been used to explain to users how the black box of AI software functions (Rudin 2019). Nevertheless, developing and validating a second non-black-box model, which is built to interpret the black box of the primary model, presents a special dilemma: if the ex- planation is completely faithful to the computation of the primary model, one would not need the primary model, but only the explan- ation resulting from the XAI model (Rudin 2019). From the per- spective of applied research, XAI is a new AI application paradigm in construction engineering. It remains unclear which algorithm can be the proper fit in delivering XAI in a general sense or whether this XAI paradigm is practically feasible.

Efforts to hybridize decision trees with artificial neural networks to enhance the interpretability of the latter were made well before DARPA coined the term XAI. Ivanova and Kubat (1995) used decision trees to initialize the weights, hidden layers, and the neurons in these layers, with the goal of making ANN setup less trial-and-error and more systematic. Boz (2000) thought that the understandability problem of neural networks could be material- ized by extracting decision rules or decision trees from the trained neural network and thus increasing the valuation of the algorithm. In the field of deep learning, Wan et al. (2020) created neural- backed decision trees that break down image classification into a sequence of intermediate decisions. This sequence of decisions then can be mapped to more-interpretable concepts to reveal in- formative hierarchical structures in the underlying classes. In addi- tion to efforts to explain neural networks by means of decision trees, Lundberg et al. (2019) studied the explainability of decision- based trees themselves. They argued that by combining many local explanations of feature importance and feature interactions with separate samples, the global structure of the model could be re- vealed. Their TreeExplainer used the SHapley Additive exPlanation (SHAP) value (Lundberg and Lee 2017) to extract such local ex- planations and to monitor the model. All the aforementioned en- deavors were attempts to explain the classification mechanism of prediction models. To our best knowledge, no application frame- works are available for developing regression-type XAI models for commonly encountered data-driven prediction problems in the domain of construction engineering and management. This research

Fig. 4. Abstract illustration of model tree (M5P): Clustering based on splitting attributes and certain criteria, e.g., yes/no in the current example; a regression function is built for each class (star—class A, circle—class B, square—class C).

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

proposes such a framework leading to the generation of XAI models based on regression trees.

Model Performance Metrics

Effective and straightforward metrics are selected based on those commonly applied to evaluate regression models. For the researcher, it is not very important which evaluating metrics to use because in most practical situations the best numeric prediction method still is the best no matter which error measure is used (Witten and Frank 2011). On the other hand, for practitioners, these metrics need to indicate whether the model is valid and acceptable. Thus, selecting proper metrics for model accuracy evaluation is vital. This section elaborates three general types of errors for evaluating regression or classification algorithms: absolute or mean errors, relative errors, and correlation coefficients.

Absolute Errors

Absolute errors are the most intuitive. For example, mean absolute error is an average of the differences between actual and predicted values. Mean absolute percentage error indicates by how much, on average, the model under- or overpredicts the target value. In prac- tical applications, the percentage error usually is avoided because it tends to be distorted by outliers.

Relative Errors

Relative errors can be good metrics to compare AI algorithms. The error is normalized by the error of the simple predictor (the differ- ences between actual values and the mean of actuals) that always predicts the mean. Furthermore, squared relative error and root rel- ative squared error often result in higher numerical values than ab- solute errors. A complete set of equations and their interpretation was given in Section 5.8 of Witten and Frank (2011).

R-squared

The R-squared is a widely used metric to estimate the accuracy of a model. Ironically, this coefficient often is confusing and can be mis- used. In statistics, R-squared refers to the coefficient of determina- tion, and is simply the square of the Pearson correlation coefficient (PCC). The PCC measures the linear correlation between two var- iables, and ranges between −1 and þ1. Mathematicians square the PCC and derive Eq. (1) to explain the percentage of variation be- tween two variables. Eq. (1) is given only to facilitate the interpre- tation of R-squared and should not be used to calculate the Pearson correlation coefficient (Witte and Witte 2017)

R2 ¼ Variancemean − Varianceðactual;predictedÞ Variancemean

ð1Þ

In machine learning, R-squared also refers to the coefficient of determination that indicates how much variation of the target value is explained by the predicted value [Eq. (2)]. In other words, if R2 is equal to 0.78, we can say that the model account for only 78% of the variation, and 22% remains hidden

R2 ¼ P

ibaseline error 2 i −

P ierror

2 iP

ibaseline error 2 i

ð2Þ

where:

errori ¼ actuali − predictedi ð3Þ baseline errori ¼ actuali − actualmean ð4Þ

Eqs. (1) and (2) essentially are identical. The other interpretation of Eq. (2) can be put in the following way: if R2 is equal to 0.78. then the model performs 78% better than a zero-rule predictor (a model which always predicts the mean); or if R2 is equal to −0.11, one can suggest that the model performs 11% worse than a zero- rule predictor. This version of the R-squared definition can be neg- ative ðR2 ∈ ð−∞; 1�Þ in the case of poor prediction performance (error is much higher than baseline error). Hence, this metric can be of great value to the user for evaluating model performance. The name, however, can be changed to coefficient of explained varia- tion to avoid confusion.

Pearson Correlation Coefficient

As mentioned previously, the Pearson correlation coefficient mea- sures the statistical correlation between two variables, and is de- noted R ∈ ½−1; 1�. In other words, this coefficient can tell whether the dependency between two parameters is weak ðR → 0Þ or strong (R → −1 or R → 1)

R ¼ Covarianceactual;predictedffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Varianceactual

p ·

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Variencepredicted

p ð5Þ

Despite the great value of PCC to statisticians, it causes confu- sion in machine learning. This metric is scalable, meaning that if we multiply all predicted values by any number and leave the actual values intact, the correlation remains the same [Figs. 5(b and c)].

This implies the possibility that if an algorithm consistently underperforms on all the predictions by a considerable margin, the correlation coefficient can remain high. An intuitive indicator of ideal prediction accuracy is the correlation line intersecting the x- and y-axes at the origin with a 45° tilt angle [Fig. 5(a)]. Thus, it is advisable to apply the correlation coefficient to justify a model’s prediction performance only if it is supported by graphical visuali- zation of the tilt angle of the correlation line.

This study used (1) correlation coefficient (R), (2) coefficient of explained variation (R2), and (3) mean absolute percentage error as AI model performance evaluation metrics.

How M5P Works

Ensemble top-down trees usually are grown to the maximum size and then pruned backward, replacing poor-performing subtrees with leaves (Fig. 9). Then the smoothing procedure adjusts the per- formance of each leaf node to compensate for sharp discontinuities that inevitably would occur between adjacent linear models (Wang and Witten 1997). These internal mechanisms are employed to achieve the highest feasible prediction accuracy for the M5P model as a whole.

Fig. 5. Illustration of the potential limitation of Pearson correlation coefficient in checking regression quality.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

Growing Initial Tree

To build the upside-down tree, M5P uses a splitting criterion [Eq. (6)] to find the attribute and the value at which to begin grow- ing branches

Splitting criterion ¼ SDðoutputÞ − SDðoutputsplitÞweighted ð6Þ where

SDðoutputsplitÞweighted

¼ SDðoutputsplit 1Þ × P joutputsplit 1j

numberinstances split 1 þ

þ SDðoutputsplit 2Þ × P joutputsplit 2j

numberinstances split 2 ð7Þ

The algorithm evaluates all possible splits and measures the magnitude by which the standard deviation (SD) of the output is reduced. The reduction is represented as the sum of the weighted standard deviations of the output values of evaluated splits. For ex- ample, if we have 2 attributes (one input and one output) and 12 instances, M5P would sort values for each attribute and find an average between adjacent points (potential splitting values) (Fig. 6). Then, Eq. (6) is calculated for each possible split (in this example, 11 splits), and a splitting value associated with the largest value of the splitting criterion is chosen.

This procedure continues until the tree is grown to the maximum size and the stopping condition is met. In the case of M5P, the tree stops growing when the leaf node has less than three instances or the standard deviation of the leaf’s output is less than 5% of the standard deviation of the output of the entire set

SDðoutputleafÞ < 0.05 · SDðoutputÞ ð8Þ

Pruning

After the tree is grown, M5P builds multiple linear regressions for each leaf as well as each subtree using standard regression and greedy search attribute selection. Then the algorithm tests each in- stance (training process) and averages the difference between pre- dicted and actual values (expected error) for each leaf and subtree. The error of every entity then is multiplied by a compensation fac- tor [Eq. (9)] to account for the fact that the model is not tested on unseen cases. The lower the number of instances, more the error is expected to increase

Compensation factor ¼ numberinstances þ numberattributes numberinstances − numberattributes ð9Þ

The pruning itself is a process of comparing the expected error of the lower leaf nodes with the expected error of the upper subtree. If regression in the subtree performs better than the regressions in the leaf nodes, those leaf nodes are pruned and the subtree becomes a leaf node (bottom-up pruning).

Smoothing

Finally, smoothing is employed to calibrate the predicted value of the leaf by propagating it to higher subtrees and eventually to the root node. Eq. (10) is calculated at each level of the tree (from leaf to subtree, from subtree to next-level subtree, and so forth, to the root node). The goal is to combine the prediction power of the leaf with the prediction power of subtrees

Predicted valueupper node ¼ Predicted valuenode × numberinstancecs þ predicted valuelower node × k

numberinstances þ k ð10Þ

where k is a constant, which in M5P is equal to 15; and numberinstances refers to the number of instances (training records) associated with the subtree which is denoted node (Fig. 7).

Steel Fabrication Labor Cost: Case Study

Data Collection

Throughout the 3 years of joint industry–academia research efforts, we collected 935 separate pre-bid estimates of steel fabrication

projects. Each estimate contained a take-off of the steel profiles (length, weight, and quantity) listed in a project. Additionally, we identified relevant project attributes as follows: location of fab- rication (six different locations), sector (oil and gas, industrial, commercial, or infrastructure), scope (supply and erection, or sup- ply only) and complexity (light, medium, heavy, or very heavy). The complexity feature was defined based on expert knowledge. For example, light-complexity projects are found in the commercial sector, and they use hollow structural steel (so-called stick build).

Fig. 6. Demonstration of how model tree (M5P) establishes the input factor splitting criteria and setting the split value.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

A heavy-complexity project generally refers to a massive structure made of plates, thus requiring considerable handling and welding operations. Avery heavy–complexity project most likely is a bridge or an oil rig. The totals for weight, quantity, and length of all the pieces were calculated as a sum of all profiles (for example, total weight = hollow structural steel weight + wide flange weight þ C-shape weight + ⋯). All the data were extracted from the com- pany’s shared folders using Visual Basic version 7.1 code em- bedded in a master spreadsheet in Excel. Table 1 lists the input features and the number of distinct values for each feature in the dataset.

The data cleaning process was executed using VBA version 7.1 code in a master spreadsheet and consisted of filtering out the in- stances that were insufficient (created by mistake or not properly filled out), deleting the instances that contained disproportional pro- file attributes (for example, plate weight ¼ 1,000 kg, length ¼ 2 m, width ¼ 3 m), and removing duplicates. The common practice in data mining is to clean the data set to its ideal condition, getting rid of all the noise and outliers. However, some valuable information could be lost in this cleaning process. Dealing with the outliers in practical applications is a critical task, because these outliers can be minority representatives of the population that do not land in the sampled data set, or they may indicate certain errors inherent in the data. A decision was made to leave as many instances as practi- cally feasible while relying on the proposed algorithms to identify noise and discern signals in the data set analytically. This research used 218 instances to build a model. Of 8,284 data values, 3,505 were absent and 5 were missing. Absent values were expressed as 0, whereas missing values were left blank. Table 2 summarizes the steel fabrication data set characteristics; the number of zeros after cleaning was significant due to the fact that a single project usually does not contain all the steel profiles, and zeros were used to denote the profiles that were not present in a project.

Attribute Selection

As was mentioned previously, greedy search is employed that selects the most valuable attributes for each regression in the model tree modeling. To make the comparison fair, we used the wrapper subset evaluator described by Kohavi and John (1997) to perform attribute selection for ANN, SVM, and RF models. Thewrapper subset evalu- ator identifies the relevant or useful features in a data set, and only the streamlinedsubsetispresentedtoamachinelearningorAIalgorithm.

A greedy (or hill climbing) algorithm is used to perform forward or/ and backward search over the space of attribute subsets. It may start with no or all attributes or from an arbitrary point in the space. The algorithm terminates when the addition/deletion of any remaining attributes results in a decrease in output evaluation performance (Frank et al. 2016). In this way, the optimal feature subset is derived specific to the particular algorithm applied.

To better understand the impact of selected attributes, total pro- portional correlation was calculated using Eq. (11), where Ri is the correlation coefficient between output and each attribute

Rproportional ¼ P

Ri numberattributes

ð11Þ

Table 1. Overview of the steel fabrication labor cost dataset: attributes and the number of distinct values for each in the dataset

No. Attribute Number of

distinct instances

1 Scope of work 2 2 Sector 5 3 Location 6 4 Complexity 4 5 Hollow steel weight 130 6 Hollow steel quantity 107 7 Hollow steel length 128 8 Wide flange weight 165 9 Wide flange quantity 133 10 Wide flange length 161 11 C-shape weight 126 12 C-shape quantity 93 13 C-shape length 123 14 L-shape weight 164 15 L-shape quantity 135 16 L-shape length 161 17 Plate weight 201 18 Plate quantity 186 19 Plate length 194 20 Round bar weight 25 21 Round bar quantity 28 22 Round bar length 28 23 Miscellaneous weight 9 24 Miscellaneous quantity 9 25 Miscellaneous length 9 26 S-shape weight 37 27 S-shape quantity 29 28 S-shape length 36 29 Wide T-shape weight 50 30 Wide T-shape quantity 42 31 Wide T-shape length 50 32 Pipe weight 34 33 Pipe quantity 32 34 Pipe length 31 35 Total weight of pieces 214 36 Total quantity of pieces 205 37 Total length of pieces 210 38 Total labor hours (output) 214

Table 2. Steel data set characteristics

Characteristic Value

No. of attributes 38 No. of instances initial 935 No. of missing values initial 22,435 No. of instances after cleansing 218 No. of missing values after cleansing 5 No. of zeros after cleansing 3,505

Fig. 7. Smoothing process demonstration: Calibrating the predicted value of the leaf by propagating it to higher subtrees and eventually to the root node.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

Some attributes, namely total weight and complexity, and plate quantity, plate weight, and hollow steel quantity, were selected by all the models, or by three of the four models (Table 3). To a certain extent, the analytical results indicate that these attributes play a cru- cial part in the labour-hours prediction.

Performance of Each Model

The performance of each model is given in Table 4 in terms of their prediction metrics. The correlation graphs based on the test- ing data are provided for each model in Fig. 8. We used 10-fold cross-validation to estimate the accuracy of the learning schemes. In 10-fold cross-validation, the data set is randomly split into 10 mutually exclusive subsets (the folds) of approximately equal size. The algorithm is trained on 90% of the data set and tested on 10% of the data set 10 times. The cross-validation estimate of accuracy is the average of 10 tests (Kohavi 1995).

Model Interpretability

This section examines the models that were obtained by run- ning each algorithm on the same steel data set. We used publicly

Table 3. Results from attribute selection for ANN, SVM, RF, and M5P models: selected attributes and correlation (R) to output

No. ANN attribute RANN (%) SVM attribute RSVM (%) RF attribute RRF (%) M5P attribute RM5P (%)

1 Total w 74 Total w 74 Total w 74 Total w 74 2 Total q 60 Plate q 58 Round bar q 8 Total q 60 3 Plate q 58 Wide flange q 56 Round bar w 8 Plate q 58 4 Plate w 48 Plate w 48 S-shape q 8 Wide flange q 56 5 Hollow steel q 41 Hollow steel q 41 Complexity 7 Wide flange w 54 6 T-shape q 29 Plate l 32 — — Plate w 48 7 C-shape q 26 Wide flange l 25 — — Hollow steel q 41 8 Location 12 L-shape l 25 — — Hollow steel w 41 9 Sector 12 Sector 12 — — L-shape w 36 10 Complexity 7 Round bar w 8 — — C-shape w 27 11 S-shape w −3 S-shape q 8 — — C-shape q 26 12 S-shape l −5 Complexity 7 — — Sector 12 13 — — — — — — Complexity 7 Rproportional (11) 30 — 33 — 21 — 41

Note: w = weight; q = quantity; and l = length.

Table 4. Results from 10-fold cross validation for ANN, SVM, RF and M5P models

Metric ANN SVM RF M5P

Absolute percentage error (want low) (%) 150 62 50 69 Coefficient of explained variation, R2 (want high) (%)

89 56 87 74

Correlation coefficient, R (want high) (%) 94 87 95 88 Correlation line tilt angle (want 45°) (degrees) 45.9 33.5 39.9 39.1

Fig. 8. Correlation scatter-plot visualization: (a) ANN; (b) SVM; (c) random forest; and (d) M5P.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

available software WEKA version 3.8.5 (Frank et al. 2016) to build the discussed models.

ANN With R2 equal to 89%, this algorithm explained the most variation in the model. On the other hand, the absolute percentage error was the highest, which could mean that the ANN had overlearned the data set by memorizing noise instead of generalizing patterns in data. The correlation line was the closest to the ideal 45° tilt [Fig. 8(a)]. The subset of chosen attributes was a reasonable representation of the problem, having a total proportional correlation of 30%. ANN parameters used in model calibration were as follows: number of hidden layers ¼ 1, number of nodes in the hidden layer ¼ 5, trans- fer function = sigmoid, learning rate ¼ 0.2, and momentum ¼ 0.1. Weights connecting with each node were initialized randomly on [−1, 1]. With 5 nodes and 12 attributes, which became 24 after transforming some variables from nominal (location, sector, and complexity) to binary, the model had 5 × 24 ¼ 120 coefficients (weights) plus 6 bias (threshold) weights. Those 126 neuron weights were initialized randomly.

SVM SVM performed the worst of the four tested models, with 56% explained variation and a 33.5° correlation line tilt angle. Seven instances were located under the correlation line, dragging it away from the ideal tilt [Fig. 8(b)]. The settings for the SVM model were transfer function (kernel) = radial basis function (RBF), SVM type ¼ ν-SVR (Chang and Lin 2019), and ν ¼ 0.5 (ν ∈ ð0; 1�, a parameter that controls the number of support vectors). Unfortu- nately, the interpretability of the resulting model is extremely chal- lenging due to the fact that radial kernel functions mathematically

transform data points into infinite dimensions in evaluating the in- fluence on each other.

RF RF achieved the highest accuracy at 100 iterations with the average size of each tree equal to 169 (number of leaves). The RF model performed the best of the four tested machine learning models in terms of the lowest absolute percentage error (50%), and the highest correlation coefficient. The tilt angle of the correlation line was 39.9°, indicating an underpredicting trend. The attribute selection in the final model missed some critical factors (e.g., steel profiles such as wide flange, plate, and hollow steel), and hence was deemed insufficient. In our view, an expert would not rely on a model that does not conform to existing know-how or common sense.

Although the RF model can be explained by analyzing a single independent tree at a time, in practical applications, the forest con- sisting of many random trees leads to difficulty in interpretation of the model logic. Every tree in the forest is initiated randomly. The forest is trained on a bootstrapped data set which is of the same size as the original training set but includes randomly selected samples from the original data set and their duplicates. The duplicates usu- ally account for 30% of the training data set, which essentially cre- ates an unnatural data set (Fan and Zhang 2009).

M5P M5P did not stand out as the winner by any of the tested perfor- mance metrics. Of the four models evaluated, it outperformed only SVM. However, from the interpretability standpoint, this model has marked advantages. First, the selected attributes in the resulting model aligned well with experts’ know-how in the steel fabrication domain based on the presented case study in collaboration with our

Fig. 9. Illustration of the M5P tree calibrated from the steel fabrication labor cost dataset (weight in kilograms).

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

industry partner. The attribute selection is embedded in the algo- rithm and does not require additional computational workload. Sec- ond, the resulting model is totally transparent to the user, and the reasoning logic of the model can be intuitively validated (Fig. 9). Numerical variable-based splitting conditions in the M5P model are straightforward and self-explanatory. Nominal variable-based splitting conditions (such as complexity) can be interpreted as fol- lows: if the project has heavy complexity (all three splits contain heavy complexity), then its binary value equals 1, and at the split it follows the “>0.5” condition. If the complexity is medium (none of the three nominal splits contain medium complexity), then its binary value is 0 and at the split it follows the “≤ 0.5” condition. In the regression formula, the same logic applies: if the sector is oil and gas or infrastructure, the multiplier equals 1; otherwise, the multiplier equals 0 (Fig. 9, second line of Linear regression 3). Therefore, if we had a new project with total weight ¼ 6,000 kg, total quantity ¼ 500 pieces, and complexity = heavy, we would use Regression 3 to predict the total amount of labor hours.

The preceding discussion is provided to justify the choice of M5P as an explainable AI model for practical application.

Three-Color Scheme as M5P Enhancement

Incorporating pruning and smoothing features to boost accuracy leads to reduction of the interpretability of the regressions and the model as a whole. Another obstacle to interpretability is the way models are verified and validated. The performance results pro- vided in Table 4 represent the average of 10 different models which were not the same as that in Fig. 9. This implies that we have no idea how good or bad each linear regression performs because they are not tested in cross-validation. We know only the training per- formance for each leaf node (Table 5).

Despite the fact that the overall training and testing accuracy had acceptable results, a close examination of each linear regression results in different conclusions. From visual inspection (Fig. 10) and performance analysis (Table 5), only one leaf node (containing Linear regression 7) aligned well with experts’ know-how and com- mon sense.

Because the accuracy of all seven leaves had been altered by smoothing, we decided to test each node separately using leave- one-out cross-validation, in which the number of folds was equal to the number of data points. By doing so, we ensured that every

Table 5. Metrics indicating M5P model tree training performance in full and at each leaf node along with number of instances involved in regression modeling

Metric Full LR1 LR2 LR3 LR4 LR5 LR6 LR7

Absolute percentage error (want low) (%) 54 112 99 72 42 48 33 26 Coefficient of explained variation, R2 (want high) (%) 93 −76 −396 39 10 59 43 86 Correlation coefficient, R (want high) (%) 97 19 23 70 58 83 79 93 Correlation line tilt angle (want 45°) (degrees) 46.0 9.8 6.2 37.1 32.2 36.3 65.3 47.1 Number of instances 218 49 35 18 47 22 8 39

Fig. 10. M5P tree with correlation graphs for each node (dashed line denoting ideal correlation).

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

linear model was treated on the same grounds, regardless of the number of available instances. In this case, regression assessment was performed independently, without impact from the higher-level subtrees. Fig. 11 depicts the performance of each leaf using three color schemes. The interpretation is as follows: for R and R2 red represents values of 0% (negative values are as red as 0% to make R comparable to R2), yellow represents R and R2 values of 50%, and green represents values of 100%. For absolute percentage error, the condition is the opposite: green indicates 0%, yellow indicates 50%, and red indicates 100%. Tilt angle becomes green at 45°, red at 0° and 90°, and yellow at midpoint of the range between 0° and 45° or 45° and 90° (i.e., 22.5° and 67.5°, respectively). The color scheme merges between the aforementioned thresholds, i.e., green to yellow and yellow to red.

This framework is not exact, and in machine learning it is dif- ficult to determine rigid boundaries for the minimum acceptable error. There is always uncertainty and risk in using predictive mod- els. This color scheme provides the user with a visual aid to decide which regression is valid to use (i.e., a green leaf), which regression should be used with caution (i.e., a yellow leaf), and which regres- sion should not be used at all (i.e., a red leaf). It is assumed that a green leaf must have at least three metrics satisfying the green con- dition and one satisfying the yellow condition. A yellow leaf can satisfy either yellow or green conditions, and if the leaf has at least one red flag on the four metrics, it is deemed a red leaf.

In Fig. 11, Leaves 4 and 7 are green and Leaf 1 is yellow. The other leaves are red and should not be used for prediction. The re- vised model with a corresponding leaf color scheme applied is de- picted in Fig. 12.

In short, the strength of the enhanced model tree modeling al- gorithms does not lie in high prediction accuracy but in its trans- parent logic and the analytical distinction of signal from noise in connection with model prediction. The proposed color scheme de- noting the quality of regression at each leaf node of the model tree is intuitive and effective to guide the practical application.

Concrete Strength Prediction: Case Study

To test the newly developed framework, we chose the publicly avail- able and well-studied Concrete Compressive Strength data set ex- tracted from the University of California, Irvine Machine Learning Repository (UCI 2020). The data set first was described by Yeh (1998). Experimental data of high-performance concrete (HPC)

Fig. 11. Three-color scheme visualization of regression performance at each leaf node of model tree by leave-one-out cross-validation: steel case study.

Fig. 12. Revised M5P tree with proper leaf color scheme applied.

Table 6. Overview of HPC data set attributes

Attribute Minimum Maximum Average

Inputs Cement (kg=m3) 102 540 281.2 Blast furnace slag (kg=m3) 0 359.4 73.9 Fly ash (kg=m3) 0 200.1 54.2 Water (kg=m3) 121.8 247 181.6 Superplasticizer (kg=m3) 0 32.2 6.2 Coarse aggregate (kg=m3) 801 1,145 972.9 Fine aggregate (kg=m3) 594 992.6 773.6 Age (days) 1 365 45.7

Output Compressive strength (MPa) 2.33 82.6 35.8

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

mix proportions and corresponding compressive strength were col- lected from 17 sources to build an ANN model. The components of the data set are described in Table 6.

Yeh (1998) achieved a coefficient of explained variance (R2) of 91.4% (fourfold cross-validation), which is considered acceptable for practical application.

Using the same concrete data set, we built the M5P model (Fig. 13) utilizing the default smoothing and pruning features in WEKA. The performance of the tree is listed in Table 7.

Fig. 13. M5P model tree calibrated using the UCI HPC dataset.

Table 7. Concrete case study: metrics for M5P model training and cross- validation performance

Metric for M5P Training 10-fold 4-fold

4-fold (Yeh 1998)

Absolute percentage error (%) 13 15 15 — Coefficient of explained variation, R2 (%) 89 86 86 91 Correlation coefficient, R (%) 94 92 92 — Correlation line tilt angle (degrees) 45.6 45.4 45.4 —

Fig. 14. Three-color scheme visualization of regression performance at each leaf node of model tree by leave-one-out cross-validation: concrete case study.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

Next, the accuracy of each leaf node in the resulting tree was tested independently using leave-one-out cross-validation (Fig. 14). Leaves 7, 8, and 9 were associated with poor performance measures and were deemed red. All other nodes had acceptable results, and were colored green for practical use. If red leaves had been re- moved, and only green leaves kept, the accuracy would have im- proved significantly (Table 8).

The following facts regarding compressive concrete strength generally are accepted in concrete mix design and construction: (1) the lower the ratio between water and binder, the higher is the strength, and (2) as concrete cures over time, the strength increases. In the model tree in Fig. 13, eight of nine splitting attributes appli- cable to cluster the data across the tree were either concrete age, cement portion, or water portion. Kadri et al. (2012) tested the com- pressive strength of high-performance concrete with different ratios of water to cement with silica fume additive. Based on the derived model tree, the value of the first splitting attribute (age ¼ 21 days) is superimposed on their graphs to visualize how the XAI model separated two different stages of curing by learning from the data. The first stage demonstrated a rapid increase in strength, whereas the second stage had a lower increasing trend (Fig. 15). The 21-day curing period is commonly accepted as the critical threshold of curing concrete in construction practice. Therefore, this case study

Table 8. Concrete case study: metrics for M5P model in consideration of only green leaves

Metric for M5P Leave-one-out

Absolute percentage error (want low) 14% Coefficient of explained variation, R2 (want high) 98% Correlation coefficient, R (want high) 94% Correlation line tilt angle (want 45°) 44.6°

Fig. 15. Strength development of concretes at different water-cementitious materials ratios. w = water, c = cement, and sf = silica fume, note that 21-day threshold for splitting concrete curing duration had been from M5P model. [Reprinted from Kadri et al. (2012), under Creative Commons-BY-3.0 license (https://creativecommons.org/licenses/by/3.0/).]

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

validated the research hypothesis that M5P can build a highly inter- pretable model that aligns properly with common knowledge in a specific problem domain.

Conclusion

Artificial intelligence methods for data-driven predictive analyt- ics have been utilized widely by researchers in the last couple of decades. However, AI features highly complex mathematical algo- rithms packed in a black box, prohibiting a straightforward ex- planation of the obtained results in simple terms. This research established a framework to apply the M5P model tree in the context of producing explainable AI for construction industry professio- nals. The hypothesis was to use the M5P model tree algorithm as an explainable AI model and to substitute it for nonexplainable AI.

In collaboration with a steel fabrication company in western Canada, we presented a real-world case study of implementing M5P, along with three commonly applied AI algorithms, to predict project labor hours based on prebid estimate data. We selected (1) correlation coefficient, (2) coefficient of explained variation, and (3) mean absolute percentage error as metrics for evaluating model performance. We not only calibrated the model tree to the lowest feasible error, but also created a model representing estimators’ know-how, making it accessible to the company’s newcomers and professionals in training. Furthermore, hidden patterns and decision rules encoded in the M5P model were revealed. Valid submodels were distinguished from invalid ones by using identified model per- formance metrics of the regressions at each leaf node of the tree.

Further, we applied the newly developed framework on a pub- licly available and well-studied Concrete Compressive Strength data set, resulting in a highly interpretable model that aligns prop- erly with common knowledge in a specific problem domain. Based on two case studies, the hypothesis of the research was validated that M5P provides a reasonable XAI solution, which builds a highly interpretable model that aligns with common knowledge in a spe- cific problem domain. The enhancement in the form of a three-color performance scheme of the M5P algorithm has no analogous con- cepts and functions in any existing software; hence it is considered to be an academic contribution of this research.

Appendix I. Steel Data Set Samples 1–4

Attribute Sample 1 Sample 2 Sample 3 Sample 4

Scope Supply and erection

Supply Supply and erection

Supply

Sector Industrial Industrial Infrastructure Industrial Location Edmonton Saskatchewan Edmonton Winnipeg Complexity Medium Medium Very heavy Medium Hollow steel weight 0 0 0 0 Hollow steel quantity 0 0 0 0 Hollow steel length (m)

0 0 0 0

Wide flange weight (kg)

6,538 0 4,661 1,346

Wide flange quantity 42 0 27 9 Wide flange length (m)

678 0 58 146

C-shape weight (kg) 258 0 0 95 C-shape quantity 9 0 0 1 C-shape length (m) 14 30 0 66 L-shape weight (kg) 2,038 0 2,624 270 L-shape quantity 44 0 40 5 L-shape length (m) 258 0 97 94

Appendix I. (Continued.)

Attribute Sample 1 Sample 2 Sample 3 Sample 4

Plate weight (kg) 1,496 15,192 6,289 84,114 Plate quantity 177 1,776 410 239 Plate length (m) 245 84 103 1,600 Round bar weight (kg) 0 0 0 0 Round bar quantity 0 0 0 0 Round bar length (m) 0 0 0 0 Miscellaneous weight (kg)

0 0 0 0

Miscellaneous quantity 0 0 0 0 Miscellaneous length (m)

0 0 0 0

S-shape weight (kg) 0 0 0 0 S-shape quantity 0 0 0 0 S-shape length (m) 0 0 0 0 T-shape weight (kg) 0 0 0 43 T-shape quantity 0 0 0 2 T-shape length (m) 0 0 0 14 Pipe weight (kg) 0 19,693 0 0 Pipe quantity 0 3,108 0 0 Pipe length (m) 0 118 0 0 Total weight (kg) 10,330 34,885 13,575 85,869 Total quantity 272 4,884 477 256 Total length (m) 1,325 201 258 1,920 Labor hours 323 1,332 3,022 2,490 Hours, predicted 213 1,175 1,904 2,074

Appendix II. Steel Data Set Samples 5–7

Attribute Sample 5 Sample 6 Sample 7

Scope Supply and erection

Supply Supply

Sector Commercial Industrial Infrastructure Location Saskatchewan Saskatchewan Vancouver Complexity Light Heavy Medium Hollow steel weight (kg) 65,709 103,477 477 Hollow steel quantity 302 490 66 Hollow steel length (m) 4,066 7,325 90 Wide flange weight (kg) 267,183 11,768 81,463 Wide flange quantity 610 77 102 Wide flange length (m) 16,921 396 98 C-shape weight (kg) 14,953 7,550 6,422 C-shape quantity 116 46 160 C-shape length (m) 726 194 210 L-shape weight (kg) 34,911 2,907 47 L-shape quantity 1,577 134 16 L-shape length (m) 4,612 74 4 Plate weight (kg) 33,120 47,044 575,571 Plate quantity 2,703 577 2,508 Plate length (m) 2,470 2,193 2,552 Round bar weight (kg) 969 0 0 Round bar quantity 2,129 0 0 Round bar length (m) 322 0 0 Miscellaneous weight (kg) 0 0 0 Miscellaneous quantity 0 0 0 Miscellaneous length (m) 0 0 0 S-shape weight (kg) 0 0 1,299 S-shape quantity 0 0 57 S-shape length (m) 0 0 155 T-shape weight (kg) 136 14,243 10,106 T-shape quantity 1 18 80 T-shape length (m) 66 1,788 143 Pipe weight (kg) 3,558 57 0 Pipe quantity 127 12 0 Pipe length (m) 556 16 0 Total weight (kg) 420,539 187,046 675,385

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A

ca de

m y

F or

S ci

en ce

& T

ec h

on 0

6/ 06

/2 1.

C op

yr ig

ht A

S C

E . F

or p

er so

na l

us e

on ly

; al

l ri

gh ts

r es

er ve

Appendix II. (Continued.)

Attribute Sample 5 Sample 6 Sample 7

Total quantity 7,565 1,354 2,989 Total length (m) 29,740 11,986 3,252 Labor hours 5,678 9,269 13,372 Hours, predicted 4,452 11,312 13,244

Data Availability Statement

The used data and models that support the findings of this study are available from the corresponding authors upon reasonable request.

Acknowledgments

The research was funded by the National Science and Engineering Research Council (NSERC) and Supreme Group through a Colla- borative Research and Development Grant (CRDPJ-501012-16). We express sincere appreciation to Kevin Guile, Dave Senio, and Arash Mohsenijam for their contribution and support. We are thankful to the journal editor and reviewers for constructive comments that had helped us considerably in improving both con- tent and organization of this paper.

References

Adeli, H. 2001. “Neural networks in civil engineering: 1989–2000.” Comput.-Aided Civ. Infrastruct. Eng. 16 (2): 126–142. https://doi.org /10.1111/0885-9507.00219.

Arrieta, A. B., et al. 2019. “Explainable artificial intelligence (XAI): Con- cepts, taxonomies, opportunities and challenges toward responsible AI.” Preprint, submitted October 22, 2019. http://arxiv.org/abs/1910 .10045.

Boz, O. 2000. “Converting a trained neural network to a decision tree DecText—Decision tree extractor.” In Proc., Int. Conf. on Machine Learning and Applications, 12. Bethlehem, PA: Lehigh Univ., Div. of Computer and Information Science.

Breiman, L., J. Friedman, R. Olshen, and C. Stone. 1984. Classification and regression trees. Boca Raton, FL: CRC Press.

Chang, C.-C., and C.-J. Lin. 2019. “LIBSVM: A library for support vector machines.” ACM Trans. Intell. Syst. Technol. 2 (3): 39. https://doi.org /10.1145/1961189.1961199.

Dhar, V., and R. Stein. 1997. Intelligent decision support systems: The sci- ence of knowledge work. Upper Saddle River, NJ: Prentice Hall.

Emmert-Streib, F., O. Yli-Harja, and M. Dehmer. 2020. “Explainable ar- tificial intelligence and machine learning: A reality rooted perspective.” Preprint, submitted January 26, 2020. http://arxiv.org/abs/2001.09464.

Fan, W., and K. Zhang. 2009. Encyclopedia of database systems. New York: Springer.

Frank, E., M. A. Hall, and I. H. Witten. 2016. The WEKA workbench. Online appendix for “Data mining: Practical machine learning tools and techniques”. 4th ed. San Francisco: Morgan Kaufmann.

Gunning, D. 2016. “Explainable artificial intelligence (XAI).” Defense Advanced Research Projects Agency (DARPA). Accessed May 15, 2020. https://www.darpa.mil/attachments/DARPA-BAA-16-53.pdf.

Halpin, D., and L. Riggs. 1992. Planning and analysis of construction operations. New York: Wiley.

Ho, K. 1995. “Random decision forests.” In Proc., 3rd Int. Conf. on Document Analysis and Recognition, 278–282. Piscataway, NJ: IEEE.

Ivanova, I., and M. Kubat. 1995. “Initialization of neural networks by means of decision trees.” Knowledge Based Syst. 8 (6): 333–344. https://doi.org /10.1016/0950-7051(96)81917-4.

Kadri, E. H., S. Aggoun, S. Kenai, and A. Kaci. 2012. “The compressive strength of high-performance concrete and ultrahigh-performance.” Adv. Mater. Sci. Eng. 2012: 1–7. https://doi.org/10.1155/2012/361857.

Kantardzic, M. 2011. Data mining: Concepts, models, methods, and algo- rithms. 2nd ed. Hoboken, NJ: John Wiley & Sons.

Kohavi, R. 1995. “A study of cross-validation and bootstrap for accuracy estimation and model selection.” In Proc., Int. Joint Conf. on Artificial Intelligence, 7. San Francisco: Morgan Kaufmann Publishers.

Kohavi, R., and G. H. John. 1997. “Wrappers for feature selection.” Artif. Intell. 97 (1): 273–324. https://doi.org/10.1016/S0004-3702(97) 00043-X.

Kulkarni, P., S. Londhe, and M. Doe. 2017. “Artificial neural networks for construction management: A review.” J. Soft Comput. Civ. Eng. 1 (2): 70–88. https://doi.org/10.22115/SCCE.2017.49580.

Lu, M., S. M. AbouRizk, and U. H. Hermann. 2001. “Sensitivity analysis of neural networks in spool fabrication productivity studies.” J. Comput. Civ. Eng. 15 (4): 299–308. https://doi.org/10.1061/(ASCE)0887-3801 (2001)15:4(299).

Lundberg, S. M., G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. 2019. “Explainable AI for trees: From local explanations to global understanding.” Preprint, submitted May 11, 2019. http://arxiv.org/abs/1905.04610.

Lundberg, S. M., and S.-I. Lee. 2017. “A unified approach to interpreting model predictions.” In Proc., 31st Conf. on Neural Information Processing Systems, 10. Red Hook, NY: Curran Associates.

Quinlan, J. R. 1992. “Learning with continuous classes.” In Proc., 5th Australian Joint Conference on Artificial Intelligence, 6. Tuck Link, Singapore: World Scientific Publishing.

Rosenblatt, F. 1961. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Buffalo, NY: Cornell Aeronautical Lab.

Rudin, C. 2019. “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.” Preprint, submitted November 26, 2018. http://arxiv.org/abs/1811.10154.

Ruping, S. 2006. “Learning interpretable models.” Ph.D. thesis, Dept. of Artificial Intelligence, der Universität Dortmund am Fachbereich Infor- matik von, Dortmund.

Tibshirani, S., and H. Friedman. 2008. The elements of statistical learning. 2nd ed. New York: Springer.

UCI (University of California, Irvine). 2020. “Concrete compressive strength data set.” Accessed June 2, 2020. https://archive.ics.uci.edu/ml/datasets /Concrete+Compressive+Strength.

Vapnik, V., and C. Cortes. 1995. “Support-vector networks.” Mach. Learn. 20 (3): 273–297. https://doi.org/10.1007/BF00994018.

Vapnik, V., and S. Kotz. 2006. Estimation of dependences based on empirical data: Empirical inference science. New York: Springer.

Wan, A., L. Dunlap, D. Ho, J. Yin, S. Lee, H. Jin, S. Petryk, S. A. Bargal, and J. E. Gonzalez. 2020. “NBDT: Neural-backed decision trees.” Preprint, submitted April 01, 2020. http://arxiv.org/abs/2004.00221.

Wang, Y., and I. H. Witten. 1997. “Inducing model trees for continuous classes.” In Proc., 9th European Conf. on Machine Learning Poster Papers, 10. Berlin, Heidelberg: Springer-Verlag.

Widman, L. E., and K. A. Loparo. 1990. “Artificial intelligence, simulation, and modeling.” Interfaces 20 (2): 48–66. https://doi.org/10.1287/inte.20 .2.48.

Witte, R. S., and J. S. Witte. 2017. Statistics. 11th ed. Hoboken, NJ: John Wiley & Sons.

Witten, I., and E. Frank. 2011. Data mining: Practical machine learning tools and techniques. 2nd ed. Amsterdam, Netherlands: Elsevier.

Yeh, I.-C. 1998. “Modeling of strength of high-performance concrete using artificial neural networks.” Cem. Concr. Res. 28 (12): 1797–1808. https:// doi.org/10.1016/S0008-8846(98)00165-3.

J. Constr. Eng. Manage., 2021, 147(8): 04021084

D ow

nl oa

de d

fr om

a sc

el ib

ra ry

.o rg

b y

A ra

b A