Hello Shouters!! Today we will create an airfare prediction model using machine learning.
Till now we are done with our data cleaning part. If you are on these blogs for the first time, I suggest you to go through previous blog as this blog is a continuation of that blog.
Till the previous blog, we did the data cleaning part of this dataset in google collab. Now in this blog, we will continue further with the model making part which is the most interesting part of machine learning.
If you don’t know anything about google collab, visit the introduction to google collab
There are several models that are directly provided by various python libraries i.e. we don’t need to code every algorithm. Every algorithm is pre-written in some library and we only need to import them and implement them.
Before implementing any algorithm to our dataset, we need to decide which machine learning algorithm is best suited for the given dataset. This is a very tedious process but requires only common sense. Expertise in deciding the best algorithm for a given dataset comes with experience.
To get a better idea of how an algorithm(model) is selected for a given dataset, we need to have a good understanding of the algorithm of the model we are trying to implement. There are various algorithms like linear regression, clustering, SVM, apriori, etc. With the knowledge of how these algorithms work, it will be easier for us to choose the best model for our dataset.
We will create blogs for each of these algorithms and will discuss in detail about these algorithms in the future. We can also choose the model by simply hit and trial and see which model works best for us. For that purpose, we find the error in our prediction and the model which gives the least error wins.
For the current dataset, we will be using Linear regression. It is one of the most famous algorithms. You can also choose some other model but as a beginner, I feel you should stick with this blog so that you won’t get stuck in some error.
We will first now split our dataset into training and test dataset. If you don’t know why we split dataset and how to do that click here.
Linear regression is found in sci-kit learn(written as sklearn library and we will import it from there and make an object of linear regression. After that, we fit(train) the model and make predictions. After making predictions, we will calculate the mean absolute error
We can see in the figure that the mean absolute error is 2295.74 which is quite high. This means that our predictions are making an error of Rs. 2295.74 per ticket. This means we have to pay 2295.74 more per ticket at an average. Thus it is clear that this model is not good at all.
Now lets try another model, which is xgboost.
In this model, we are getting error as 1250.33. Thus it is clear that this model is better than the previous one. Similarly, you can find the best model for your dataset. Maybe some other model will give better results than xgboost.
In this blog, we have completed building our first machine learning model and now you are ready to go to some advanced machine learning concepts. I will post the concepts which I have kept for future discussions in the further blogs. These blogs will strengthen your concepts of machine learning. So don’t forget to check them out.
If you want to understand machine learning from scratch, I suggest you visit a practical approach to machine learning. and for starting with pandas, visit the introduction to pandas. We hope now you will be able to create an airfare prediction model using machine learning by yourself.
For placement preparation questions and technical interview preparation. Check the Instagram account: https://www.instagram.com/shoutcoders/
Frequently Asked Questions –
After applying different models, the model which gives the least mean absolute error will be the best.
There are various hyper tuning parameters that have to be given along when defining the model. Forgiving the best parameters you must know how the algorithm works.
It is completely based on your experience and common sense. For better understanding, you must know how the algorithm works.