Jauhar, Sunil Kumar
Harinath, Susmitha
Krishnaswamy, Venkataraghavan
Paul, Sanjoy Kumar https://orcid.org/0000-0001-9523-179X
Funding for this research was provided by:
University of Technology Sydney
Article History
Received: 25 July 2023
Accepted: 7 October 2024
First Online: 26 October 2024
Declarations
: During the preparation of this work, the author(s) used [Grammarly and Paperal] to improve the manuscript’s grammar. After using this tool/service, the author(s) reviewed and edited the content as needed and took (s) full responsibility for the publication’s content<b>.</b>
: The authors have no conflicts of interest to declare.
: This article contains no studies involving human participants or animals performed by authors.
: See Table .
: Hierarchical clusteringClustering methods belong to the category of unsupervised machine learning. Clustering aims to identify patterns and group the data using its features. Based on their approach, clustering techniques can be classified into partitioning, hierarchical, density, and grid-based methods. This study used k-means (partition method) and hierarchical clustering (hierarchical method). Partitioning algorithms divide the given “n” data points into “k” partitions or, in other words, “k” clusters. The objects are clustered such that the objects in a cluster are similar and dissimilar to objects in other clusters. Hierarchical clustering algorithms create top-down or bottom-up decompositions of data points. In the top-down approach, a data cluster is progressively decomposed into smaller clusters, whereas, in the bottom-up approach, individual data points are progressively agglomerated into a larger cluster. As shown in Appendix A, agglomerative clustering was used in the configuration.Random forestRandom forests belong to the class of ensemble models used in machine learning. Ensemble models use a set of weak learners to build a robust model. A random forest is an ensemble of multiple decision trees, where each decision tree is trained on a randomly chosen subset of data features. The outcomes of the decision trees were aggregated to arrive at the outcome. The Random Forest model used in this study had a maximum depth of 11 and 500 estimators.LSTMLSTMs is a recurrent neural network (RNN) class that uses memory cells to handle long-term dependencies. LSTM consists of input, output, and forget gates to selectively retain, modify, and discard long-term information, respectively. The LSTM processes the inputs sequentially at each time step, and the operations are as follows:Forget Gate: Determines the parts of the memory that have to be forgotten using the previous hidden state and current inputs. The forget gate vector f<sub>t</sub> is given by f<sub>t</sub> = σ(W<sub>f</sub>⋅[h<sub>t−1</sub>,x<sub>t</sub>] + b<sub>f</sub>), where σ, W<sub>f</sub>, x<sub>t</sub>, h<sub>t−1</sub>, and b<sub>f</sub> are the sigmoid activation function, weight matrix of the forget gate, current input, previous hidden state, and bias term of the forget gate.Input Gate: Determine the information to update the current cell state. It determines the input gate vector i<sub>t</sub> and the candidate cell state. <sub>t</sub>. The input vector i<sub>t</sub> is given by i<sub>t</sub> = σ(W<sub>i</sub>⋅[h<sub>t−1</sub>,x<sub>t</sub>] + b<sub>i</sub>) and <sub>t</sub> = tanh(W<sub>c</sub>⋅[h<sub>t−1</sub>,x<sub>t</sub>] + b<sub>c</sub>) where tanh, W<sub>i</sub>, W<sub>c</sub>, b<sub>i,</sub> and b<sub>i</sub> are hyperbolic tangent activation function, weight matrixes, and bias terms, respectively.Cell State Update: Information from the forget and input gates is used to update the cell state. The new cell state C<sub>t</sub> is given by C<sub>t</sub> = f<sub>t</sub> ⊙ C<sub>t-1</sub> + i<sub>t</sub> ⊙ <sub>t</sub>.Output Gate: Determine the hidden state h<sub>t</sub> for the next time step. The hidden state h<sub>t</sub> is given by h<sub>t</sub> = O<sub>t</sub> ⊙ tanh(C<sub>t</sub>) and O<sub>t</sub> = σ(W<sub>o</sub>⋅[h<sub>t−1</sub>,x<sub>t</sub>] + b<sub>o</sub>), where W<sub>o</sub> and b<sub>o</sub> are the weight matrix and the bias term for the output gate, respectively.As shown in Appendix A, a single-layer LSTM configuration was used.SVRSupport Vector Machines belong to the family of boundary separation methods used in machine learning. SVMs can be used for regression (SVR) and classification (SVC). SVR aims to identify the optimal hyperplane within threshold Є for any given y. SVR can be applied to linear and nonlinear regressions. SVR supports nonlinear regression through popular kernel functions, such as RBF, Polynomial, and Sigmoid. In this study, we applied the RNF kernel while implementing the SVR.AutoregressionAutoregression (AR) means the regression of a variable measured at different periods. This difference in period is termed the lag. The equation of AR with lag p, i.e., AR(p), can be represented as . The number of lags was identified from the partial autocorrelation plots. In this study, we used an Auto Regression Model with lag 2.Vector autoregressionUnlike Autoregression, A VAR model uses multiple time series variables. Each variable in the system was regressed on past values of the variables in the system. We modeled the VAR with the following variables: MntWines, MntTotal, NumWebVisitsMonth, NumPurchases, Age, Children, Family_Size, Education_Graduate, Education_Postgraduate, Living_With_Alone, and Living_With_Partner.Feature importance of using shapShapley valued quantity as the impact of a coalition of players in a cooperative game. In machine learning, the model is a coalition of features and the output is the payoff. The Shap values calculate the difference between the actual and average predicted outputs for different coalitions with and without a feature. The difference provides the marginal contribution of a feature across all coalitions (subsets of features). A positive/negative Shap value indicates that the feature positively/negatively affects the prediction. The magnitude of the Shap value indicates the strength of the impact of a feature.