Machine Learning Interview Questions

1) What do you mean when you say “machine learning”? 

Machine learning is a branch of Artificial Intelligence that works with system programming and data analysis, allowing computers to learn and respond without being explicitly programmed.

2) What is the difference between inductive and deductive learning? 

In inductive learning, the model draws a generalised conclusion by learning by example from a set of observed instances. In deductive learning, on the other hand, the model first applies the conclusion before drawing the conclusion. 

  • Inductive learning is the process of drawing inferences from observations. 
  • Deductive learning is the process of forming observations based on inferences.

3) How do you distinguish between data mining and machine learning? 

Data mining is a method that attempts to abstract information or interesting unknown patterns from organised data. Machine learning algorithms are used in this procedure. 

The study, design, and development of algorithms that allow processors to learn without being explicitly programmed is known as machine learning.

4) What does “overfitting” in machine learning mean? 

In machine learning, overfitting occurs when a statistical model describes random error or noise rather than the underlying relationship. When a model is very complex, overfitting is likely to occur. It occurs as a result of having too many parameters relating to the amount of training data types. The model has been overfitted, resulting in poor performance.

5) What causes overfitting? 

When the criteria used to train the model to differ from the criteria used to measure the model’s efficiency, overfitting is a possibility.

6) What is the procedure for avoiding overfitting? 

When a model is attempting to learn from a little dataset, overfitting happens. Overfitting can be avoided by using a vast amount of data. However, if we just have a tiny database and must construct a model on it, we can utilise a technique called cross-validation. In this strategy, a model is typically given a dataset of known data on which to train and a dataset of unknown data against which to test it. Cross-main validation’s goal is to define a dataset to “test” the model throughout the training phase. To avoid overfitting, ‘Isotonic Regression’ is utilised if there is enough data.

7) Establish a distinction between supervised and unsupervised machine learning. 

  • The system is educated using labelled data in supervised machine learning. After that, a new dataset is fed into the learning model, and the algorithm analyses the labelled data to produce a favourable result. For example, before doing classification, we must first label the data, which is required to train the model. 
  • The computer is not taught with labelled data in unsupervised machine learning, and the algorithms are allowed to make decisions without any matching output variables.

8) What is the difference between Machine Learning and Deep Learning? 

  • Machine learning is made up of algorithms that analyse data, learn from it, and then apply what they’ve learned to make more informed decisions. 
  • Deep learning is a type of machine learning that is based on the human brain’s structure and is particularly useful in feature detection.

9) What distinguishes KNN from k-means? 

The KNN algorithm, often known as K nearest neighbours, is a supervised classification technique. A test sample in KNN is defined as the majority of its nearest neighbours’ class. K-means, on the other hand, is an unsupervised technique that is primarily used for clustering. Only a set of unlabeled points and a threshold are required for k-means clustering. The algorithm then learns how to cluster unlabeled data into groups by computing the mean distance between various unlabeled points.

10) What are the different types of Algorithm methods in Machine Learning?

The different types of algorithm methods in machine earning are:

  • Supervised Learning
  • Semi-supervised Learning
  • Unsupervised Learning
  • Transduction
  • Reinforcement Learning

11) What do you mean when you say “reinforcement learning”? 

Reinforcement learning is a machine learning algorithm technique. It entails an agent interacting with its surroundings by performing activities and detecting faults or rewards. Different software and computers use reinforcement learning to determine the best appropriate behaviour or path to take in a given situation. It normally learns by receiving a reward or a penalty for each action it does.

12) What is the cost of bias versus variance? 

Bias and variance are both mistakes. Bias is an inaccuracy in the learning algorithm caused by erroneous or excessively simplified assumptions. It may cause the model to under-fit the data, making it difficult to achieve high predicted accuracy and generalise knowledge from the training to the test sets. 

Variance is an error caused by a learning method with too much complexity. As a result, the technique is extremely sensitive to large amounts of variation in the training data, which can cause the model to overfit the data.

13) What Should You Do If a Dataset Has Missing or Corrupted Data? 

Dropping certain rows or columns or replacing them totally with another value is one of the simplest ways to deal with missing or incorrect data. 

In Pandas, there are two useful methods: 

  • IsNull() and dropna() will assist you in locating and dropping missing data columns/rows. 
  • Fillna() will use a placeholder value to replace the incorrect values.

14) What are the five popular algorithms we use in Machine Learning?

  • Five popular algorithms are:
  • Decision Trees
  • Probabilistic Networks
  • Neural Networks
  • Support Vector Machines
  • Nearest Neighbor

15) What exactly do you mean when you say ensemble learning? 

Ensemble learning is the process of creating and combining several models, such as classifiers, to solve a certain computer problem. Committee-based learning or learning multiple classifier systems are other names for ensemble methods. It teaches different hypotheses to solve the same problem. Random forest trees are a good example of ensemble modelling since they use numerous decision trees to forecast outcomes. It is used to improve the categorization, approximation of functions, prediction, and other aspects of a model.

16) In Machine Learning, what is a model selection? 

Model selection is the process of selecting models from a variety of mathematical models that are used to define the same facts. In the domains of statistics, data mining, and machine learning, model learning is used.

17) In machine learning, what are the three steps of developing hypotheses or models? 

In machine learning, there are three stages of building hypotheses or models: 

  • Construction of a model: It selects an appropriate method for the model and trains it to meet the problem’s requirements. 
  • Applying the model: It is in charge of checking the model’s accuracy using test data. 
  • Validation of models: After testing, it makes the necessary changes and applies the final model.

18) What is the usual strategy to supervised learning, in your opinion? 

The conventional strategy in supervised learning is to divide the set of examples into the training set and the test set. 

19) Explain the terms “training set” and “training test.” 

A set of data, referred to as a ‘Training Set,’ is used in several fields of machine learning to find potentially predictive relationships. The student is given an example in the form of a training set. In addition, the ‘Test set’ is utilised to check the accuracy of the learner’s hypotheses. It’s the set of examples withheld from the learner. As a result, the training set and the test set are distinct.

20) What are some of the most frequent approaches to dealing with missing data in a dataset? 

When working with data and addressing it, missing data is a common occurrence. It is regarded as one of the most difficult tasks that data analysts encounter. The missing values can be imputed in a variety of ways. Delete the rows, replace with mean/median/mode, forecast the missing values, create a unique category, use algorithms that support missing values, and so on are some of the popular approaches for dealing with missing data in datasets.

21) What exactly do you mean by ILP? 

Inductive Logic Programming (ILP) is an acronym for Inductive Logic Programming. It is a type of logic programming that is used in machine learning. Its goal is to look for patterns in data that may be used to create prediction models. The logic programmes are assumed as a hypothesis in this approach. 

22) What are the steps in a Machine Learning Project that must be completed? 

When working on a Machine Learning project, there are various processes that must be followed in order to build a good working model. Parameter tweaking, data preparation, data collection, model training, model evaluation, and prediction are examples of these phases.

23) What is the difference between precision and recall? 

Precision and recall are two measurements used in the information retrieval sector to assess how well a retrieval system reclaims the linked data as requested by the user. 

Precision is a term that refers to a positive predictive value. It is the proportion of relevant occurrences among the total number of examples received. 

Recall, on the other hand, is the percentage of relevant instances that have been retrieved out of a total number of relevant instances. Sensitivity is another term for recall.

24) What do you mean when you say “decision tree” in the context of machine learning? 

Supervised Machine Learning can be characterised as Decision Trees, in which data is continuously separated according to a specific parameter. It creates classification or regression models in a tree-like structure, breaking down datasets into smaller and smaller subsets as the decision tree grows. Decision nodes and leaves are the two entities that define the tree. The decision nodes are where the data is split, and the leaves represent the decisions or outcomes. Both categorical and numerical data can be managed using decision trees.

25) What are the advantages and disadvantages of supervised learning? 

  • Classification
  • Speech Recognition
  • Regression
  • Predict Time Series
  • Annotate Strings

26) What are the functions of unsupervised learning?

  • Identifying data clusters 
  • Finding low-dimensional data representations 
  • Identifying interesting trends in data 
  • Finding new observations/cleaning up the database 
  • Identifying intriguing coordinates and relationships

27) What does “algorithm independent machine learning” mean to you? 

Machine learning with mathematical foundations that are independent of any particular classifier or learning algorithm is known as algorithm independent machine learning. 

28) Explain how a machine learning classifier works. 

A classifier is a type of hypothesis or discrete-valued function that assigns class labels to specific data points. It’s a system that takes a set of discrete or continuous feature values as input and produces a single discrete value, the class.

29) What do you mean by Genetic Programming?

Genetic Programming (GP) is almost similar to an Evolutionary Algorithm, a subset of machine learning. Genetic programming software systems implement an algorithm that uses random mutation, a fitness function, crossover, and multiple generations of evolution to resolve a user-defined task. The genetic programming model is based on testing and choosing the best option among a set of results.

30) What is SVM in machine learning? What are the classification methods that SVM can handle?

SVM stands for Support Vector Machine. SVM are supervised learning models with an associated learning algorithm which analyze the data used for classification and regression analysis.

The classification methods that SVM can handle are:

  • Combining binary classifiers
  • Modifying binary to incorporate multiclass learning

31) How will you explain a linked list and an array?

An array is a datatype which is widely implemented as a default type, in almost all the modern programming languages. It is used to store data of a similar type.

But there are many use-cases where we don’t know the quantity of data to be stored. For such cases, advanced data structures are required, and one such data structure is linked list.

There are some points which explain how the linked list is different from an array:

An array is a group of elements of a similar data type. Linked List is an ordered group of elements of the same type, which are connected using pointers.

An Array supports Random Access. It means that the elements can be accessed directly using their index value, like arr[0] for 1st element, arr[5] for 6th element, etc.

As a result, accessing elements in an array is fast with constant time complexity of O(1).

Linked List supports Sequential Access. It means that we have to traverse the complete linked list, up to that element sequentially which element/node we want to access in a linked list.

32) What do you understand by the Confusion Matrix?

A confusion matrix is a table which is used for summarizing the performance of a classification algorithm. It is also known as the error matrix.


TN= True Negative

TP= True Positive

FN= False Negative

FP= False Positive

33) Explain True Positive, True Negative, False Positive, and False Negative in Confusion Matrix with an example.

True Positive: When a model correctly predicts the positive class, it is said to be a true positive.

For example, Umpire gives a Batsman NOT OUT when he is NOT OUT.

True Negative: When a model correctly predicts the negative class, it is said to be a true negative.

For example, Umpire gives a Batsman OUT when he is OUT.

False Positive: When a model incorrectly predicts the positive class, it is said to be a false positive. It is also known as ‘Type I’ error.

For example, Umpire gives a Batsman NOT OUT when he is OUT.

False Negative: When a model incorrectly predicts the negative class, it is said to be a false negative. It is also known as ‘Type II’ error.

For example, Umpire gives a Batsman OUT when he is NOT OUT.

34) Which do you think is more important: model correctness or model performance? 

Model performance is a subset of model correctness. The model’s performance is directly related to its accuracy. As a result, the more accurate the forecasts are, the greater the model’s performance. 

35) What is the difference between bagging and boosting? 

Bagging is a technique for enhancing unstable estimating or classification algorithms in ensemble learning. To lower the bias of the combined model, boost approaches are utilised consecutively.

36) What are the similarities and differences between bagging and boosting in Machine Learning?

Similarities of Bagging and Boosting

  • Both are the ensemble methods to get N learns from 1 learner.
  • Both generate several training data sets with random sampling.
  • Both generate the final result by taking the average of N learners.
  • Both reduce variance and provide higher scalability.

Differences between Bagging and Boosting

  • Although they are built independently, but for Bagging, Boosting tries to add new models which perform well where previous models fail.
  • Only Boosting determines the weight for the data to tip the scales in favor of the most challenging cases.
  • Only Boosting tries to reduce bias. Instead, Bagging may solve the problem of over-fitting while boosting can increase it.

37) Explain Cluster Sampling? 

Cluster sampling is the practise of choosing intact groups with comparable features from a specific population at random. A cluster sample is a probability in which each sampling unit is a group of elements. 

If we want to cluster the total number of managers in a group of companies, for example, managers (sample) will be elements and companies will be clusters.

38) How well do you understand Bayesian Networks? 

Bayesian Networks, also known as ‘belief networks’ or ‘casual networks,’ are graphical models that depict the probability relationship between a set of variables. 

For example, to describe the probabilistic associations between diseases and symptoms, a Bayesian network can be employed. The network can also calculate the chances of various diseases being present based on the symptoms. 

In Bayesian networks, efficient algorithms can do inference or learning. Dynamic Bayesian networks are Bayesian networks that link variables (such as speech signals or protein sequences).

39) What are the Bayesian logic program’s two components? 

A Bayesian logic programme is made up of two parts: 

  • Logical: It contains a set of Bayesian Clauses that capture the domain’s qualitative structure. 
  • Quantitative: It’s used to represent quantitative data about a domain. 

40) Explain how machine learning uses dimension reduction. 

Dimension reduction is a technique for reducing the amount of random variables to be considered.  Feature selection and extraction are two aspects of dimension reduction.

41) Why is the instance-based learning algorithm also known as the Lazy learning algorithm? 

Lazy learning is a machine learning method in which induction and generalisation procedures are postponed until classification is completed. An instance-based learning algorithm is also known as a slow learning algorithm because of the same property. 

42) What does the F1 score mean to you? 

The F1 score is a metric for evaluating a model’s performance. A weighted average of a model’s precision and recall is what it’s called. The outcomes that tend to 1 are considered the best, while those that lean to 0 are considered the worst. It might be useful in categorization tests where true negatives aren’t as important.

43) How do you prune a decision tree? 

Pruning is supposed to occur in decision trees when branches with low predictive power are pruned to reduce the model’s complexity and improve its forecast accuracy. Pruning can be done from the bottom up or from the top down, using techniques like reduced error pruning and cost complexity pruning. 

The simplest version is reduced error pruning, which replaces each node. If it is not possible to reduce prediction accuracy, it should be trimmed. However, it usually comes quite close to a method that maximises accuracy.

44) What are the System Recommendations? 

A subdirectory of information filtering systems is Recommended System. It anticipates a user’s preferences or rankings for a product. It makes comparable recommendations to a user based on their preferences. Movies, news, research articles, merchandise, social tips, and music all use recommendation systems. 

45) What do you mean when you say “underfitting”? 

When both the training and testing sets have a low error, underfitting is a problem. Few algorithms perform better for interpretations but fall short when it comes to making better predictions.

46) When does regularisation in machine learning become necessary? 

When the model begins to overfit or underfit, regularisation is required. It’s a word used to describe the expense of adding extra features to the objective function. As a result, it seeks to lower the cost term by pushing the coefficients for many variables to zero. It aids in the reduction of model complexity, allowing the model to become more predictive (generalizing).

47) What is Regularization? What kind of problems does regularization solve?

A regularisation is a type of regression in which the coefficient estimates are constrained, regularised, or shrunk towards zero. To prevent the risk of overfitting, it inhibits learning a more sophisticated or flexible model. It lowers the model’s variance without significantly increasing its bias. 

Regularization penalises the loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm of weights vector w to address overfitting concerns.

48) Why do we need to convert categorical variables into factor? Which functions are used to perform the conversion?

The majority of machine learning algorithms require input in the form of a number. To achieve numerical values, we translate categorical data into factors. We don’t have to worry about dummy variables either. 

The functions factor() and as.factor() are used to convert variables into factors.

49) Do you think that treating a categorical variable as a continuous variable would result in a better predictive model?

Only when the categorical variable is ordinal in nature can it be regarded a continuous variable for a better predictive model.

50) How is machine learning used in day-to-day life?

The majority of individuals now use machine learning in their daily lives. Assume that when you use the internet, you are genuinely expressing your preferences, likes, and dislikes by conducting searches. All of these items are gathered up by cookies on your computer, and the user’s behaviour is analysed as a result. It facilitates a user’s navigation over the internet by providing similar ideas. 

The navigation system is another example of how we can use machine learning to compute the distance between two points utilising optimization approaches. Machine learning will undoubtedly become more popular in the near future.


Leave a Reply

Your email address will not be published. Required fields are marked *