Zomato Bangalore Restaurant Analysis and Rating Prediction

purnasai gudikandula
6 min readJul 14, 2019

Data cleaning,EDA and Model Building

In this Blog, we are going to see an End to End Project kind thing on Zomato Bangalore Restaurant Descriptive Analytics and Predicting Rating for Each restaurant depends on its facilities and Features that each restaurant provides.

You can download the Dataset from here and download Ipython book/code from here or view it here.

Problem:-

The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of the restaurant at different places in Bengaluru, aggregate rating of each restaurant, Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world. With each day new restaurants opening the industry hasn't been saturated yet and the demand is increasing day by day. Inspite of increasing demand it, however, has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food as they don’t have time to cook for themselves.

Opportunity:-

From all the Data available, we can bring out some neat insights or conclusions such as

  • Which franchise has the highest number of Restaurants?
  • How many Restaurants are accepting online orders?
  • How many have a book table facility?
  • Which location has the highest number of Restaurants?
  • How many types of Restaurant types are there?
  • What is the most liked Restaurant type?
  • What is the Average cost for 2 persons?
  • What is the most liked Dish type?

And so on…..

Tools used:-

  • python 3.6
  • Numpy
  • Pandas
  • Matplotlib
  • Seaborn
  • Data science
  • Machine learning

Let’s start.

You can see Importing Libraries, Data loading, its Info and some stats before cleaning and preprocessing in the Notebook/code here.

we have 51717 entries/records of Data with 17 columns such as URL, Address, Name, Online_order, Book_table, Rating, Phone number, Location, Restaurant type, Dish liked, Cuisines, Average cost for 2 persons, Reviews_list, Menu and more.

For all the preprocessing and Each feature/column cleaning is done in the Notebook, please have a look. Let’s now Explore each and Every column or Feature.

Restaurant Name:-

We have a Feature/Column called Name that says about all the Restaurants that tie/partnered with zomato in Bangalore. We have several Restaurants few named like Cafe Coffee Day, Onesta, Just Bake, Kranti sweets and more.

Bar plot for top 20 restaurants with their count

from the above graph, we can see that we almost have 100 restaurants for the Franchise of Cafe Coffee Day.

Online orders:-

Of all the registered Restaurants in zomato, How many are Accepting online orders and how many are not accepting. from the graph below you can understand that we have almost 30,000 Restaurants in Bangalore that Accepts online orders through zomato and Almost of 20,000 are not accpeting any online orders through zomato.

Count of Restaurants that Accepting and Not accepting online orders

Book table:-

As similar we have several Restaurants with Book table facility and few dont have. After analysing the Data we have we come to know that out of all the Registered 51,000 restaurants only A 10,000 are accepting Book table facility.

Count of Resturants with Book table Facility

Location of Restaurants:-

We have overall 93 locations where all the 51,000 Restaurants lie. Let’s see which one has More restaurants in Each location. We know that pie graph is always a composition of 100% and From the Pie Graph below you can see only The Top 10 Restaurants out of many.

Only the Top10 location are shown here.

In the Graph above only the top 10 locations are shared while we have 93 locations. you can see below the Bar Graph of Location column.

top 10 locations with Bar Graph

So we have more Restaurants in BTM Location.

Restaurant Types:-

We have several Restaurant types such as Quick bytes, Casual Dining, Cafe, Delivery, Dessert Parlors, Bar, Food court, Pubs, Lounge, Sweet shop and so on. you are seeing only the top 6 Restaurant types sharing a Pie Graph.

Top 6 Restaurant types

So we have Quick Bytes as Top Restaurant Type.

Average cost for 2 persons:-

We have several costs ranging from 300 to 4000 for 2 people depends upon the Restaurant type, Dishes and cuisines they order or they like. lets see the Donut Graph with Average costs for 2 people in all the Restaurants in Bangalore from zomato.

Average cost for 2 people in Bangaloe Restaurants

Mostly the Average cost for 2 people is Around 300 Indian Rupees.

Dishes That Bangalore liked:-

We have a Feature/Column called Dishes_liked, which tells about all the different dish types that people in Bangalore likes. Dishes such Pasta, Burgers, Pizza, Biryani, Sandwiches, paratha and so on. lets see from the graph to know,What is the Bangalore most liked Food.

Top 10 Dishes that Bangalore liked

So most of the Food/Dish that Bangalore people like is Pasta.

Rating:-

We also have a column for Rating given to Each and Every restaurant on Average by all the people who ordered or visited. after Analysing we come to know is 3.9 for all the Restaurants overall gained which means that are good to order or to eat from all the Restaurants in Bangalore.

Cuisines:-

We have several cuisines made by Restaurants in Bangalore for the people over there as they would be from different parts of the country and from different cultures.

We have cuisines such as North Indian, Chinese, Continental, Caffe, Fast food and several others. After looking at the graph you can see that we have North Indian is the Most liked cuisine.

Model Building:-

From now we will talk about machine learning and its models that work to predict the Rating of the Restaurants in bangalore.

we have columns/Feature called Online order and Book table which are Categorical variables and for machine learning to model work we should input numerical values to perform. hence use Label Encoding on these 2 Features that encode Yes/No as 0/1.

Drop the unnecessary columns that wont play much in deciding the Rating of the Restaurants. Also do apply Label Encoding on Features like “Location”, “Restaurant type”, and “Cuisines”.

After Encoding split the Dataset to X and Y variables and again split to Train and Test sets of 70% and 30%. Apply Standardisation on Dataset as we have different scale ranges for different Features. Hence after applying Standard scaling it will bring all the values to a common range which is easy for model to compute and makes computation fast.

After applying Several Regression models such as Linear Regression, Ridge Regression, Lasso Regression and Random forest Regression, Random Forest Regression has yielded us Best Accuracy compared to all the other models which is of 90%.

You can even see the Predicted vs Actual to see how well our model is predicting the ratings of the Restaurants. you can see below how close they are.

Actual vs Predicted ratings

You can improve all the model accuracy by using/applying Hyperparameter optimization, Ensemble methods, Cross validation.

you can see or download the Notebook/code from here.

This blog is co-authored by Mohammad Roufa. you can check his linkedin here.

Thank you. please clap or share this blog.

Comment down your thoughts , will include them in blog again.

--

--