Automated Machine Learning in Analytics Plus
Automated machine learning in Analytics Plus provides a code-free experience to train, verify, and build custom machine learning models with high efficiency. The simple and user-friendly design makes it possible for people with a minimal level of expertise to create ML models easily and make smart business choices.
Users can use their own historical data from various segments of IT, such as network operations data, to build a model that will help predict the next downtime or cluster problematic devices together based on recent activity, or data from help desk tickets to build a model that will predict the probability of escalation of a ticket.
- Best practices
- Steps involved in building and launching an Auto ML Model
- Machine learning models in Analytics Plus
Best practices
Clearly define the business problem for which you intend to build an ML model. Ensure you select the relevant fields and columns that have a direct correlation with the expected results. While AutoML can be a powerful tool, its accuracy depends on the information made available to build the model.
Building a ML model in Analytics Plus comprises of two high-level steps,
- Select the input dataset for training and pick the model that fits your use case. Please note that you require sample or historical data available with you in order to build an ML model. For example, if you wish to build a model that predicts the next downtime, you should have historical performance data and outages that happened in the past for the model to learn from.
- Assess the model performance and deploy the model to a new dataset.
Steps involved in building and launching an Auto ML Model
Select the data for training
- Once you have decided the use case for which you need to build an ML model, click the Create New icon on the side navigation panel.
- Choose Auto ML from the drop-down menu.
- Give a suitable Analysis Name and Description.
- Select the Prediction Type that should be applied. Refer to the Predictive models sections for more information.
- Select the Training Table and the Target Column.
- Click Create to start training the ML model.
Note: Currently columns of Date data type cannot be used for model training.
Model information
Once the training is completed, the model will be saved and will be listed in the Analysis tab. Click the model name to get additional details such as the Algorithm Name, Accuracy and Training Time of the model.
Deploy the ML model
Once you have assessed the quality of the ML model, you can deploy it on a production dataset to get results.
- Click the Deploy Now button on the top.
- Select the Input table for which the prediction model should be applied.
- Select the Output table - this is the table where the result of the model will be stored.
- Select the Schedule Time.
- Click Deploy Now.
A new table will be created; you can then create visualizations on top of it.
Machine learning models in Analytics Plus
The quality of the output generated by AutoML framework will depend on choosing the right machine learning model from the available list of options. Ensure you select the model that is most appropriate to the dataset at hand and the result that is expected.
Regression Model
Regression is a supervised learning method used to determine the relationship between the dependent and independent variables. The regression model is primarily used for predictive analysis.
Random Forest Regression
Random forest regression is a supervised machine learning algorithm that uses a combination (ensemble) of decision trees for prediction. Random subsets of the training data are chosen while constructing each decision tree. Each decision tree is combined to output a single prediction value.
The random forest model is best suited for predicting continuous values, like time series forecasting and price predictions. Since the algorithm involves constructing multiple decision trees, the predictions are always of high accuracy.
Classification Model
Classification is a supervised machine learning method that predicts the category or type to which an observation or data point belongs. For instance, the classification of emails as spam, social, or primary.
Random Forest Classification
The random forest classification is a supervised machine learning method that combines multiple decision trees to arrive at a conclusion. This method is best suited for discrete variables.
Clustering Model
Clustering is an unsupervised learning technique. This model identifies patterns and relationships within the data that are not immediately apparent and groups similar data points into clusters.
K - means
The K-means algorithm segregates a dataset into K distinct, non-overlapping clusters. This is an iterative process that assigns each data point to one of the K-clusters based on the input provided. This algorithm works effectively with quantitative data as it is based on calculating distances between data points.
K - modes
The K-modes algorithm is used for grouping categorical data, like segmentation based on demographics. Each cluster is determined by modes; the most frequent value in the cluster.
K - Prototypes
The K-Prototypes algorithm is an extension of the K-Means algorithm used for clustering datasets containing both numerical and categorical features. It combines the K-Means algorithm's clustering approach for numerical data with a mechanism to handle categorical data.