In 2013 over one thousand young people in Slovakia were asked to answer nearly 150 questions about their lives, hobbies, preferences, fears and many more. This data set, available on Kaggle platform, turns out to be the perfect source of information about young people, just waiting there for deeper analysis. And guess what, I found it interesting too!
This data set is very rich and full of information. I decided to focus on several points, that I found the most fascinating. I started with movie preferences analysis as watching movies is one of my favorite way of spending free time.
My analysis of movie preferences was technically speaking, hypothesis testing. Respondents were asked 12 questions, starting with “do you enjoy watching movies?” and then going through most popular movie types with the same question: “how much do you like/dislike certain type”, with 5 — meaning enjoy very much and 1 — don’t enjoy at all.
My questions were simple: do young people in general like watching movies, and what are the main differences in preferences of men and women.
Let’s take a look at some diagrams.
Watching movies, in general, is loved by everyone, the score reaches here almost maximum. When it comes to movie types, comedies are definitely number one, whereas horrors and western fall apart. Now, are there any differences between men and women choices?
In the left figure there are all movie types listed divided into two subgroups: blue — male and orange — female. At first glance you can easily notice that there are some differences, but having a look at the right diagram makes it much easier to see the details. There are two things in which gender has no influence — both men and women enjoy watching movies equally and their top choice is a comedy film. And now some facts that probably everyone without deeper analysis could think of: war, action and western are definitely men area, whether romance, fairy tales and animation — women. I know… who could expect it? But anyways, now it’s confirmed with the data!
This survey provided a huge dataset, it might be difficult to get an idea how to search for some patterns or dependencies. Probably a good idea is to check if the data is somehow correlated? I’m not a sociologist, honestly, I have no idea which categories could potentially correlate. But I made a decision to take a closer look at two groups that I found fascinating.
First one: demographics and spending habits. Does gender, size of the town, siblings, could have impact on spending habits?
Looking at the matrix above, I had some thoughts. In general, there are some correlations within each category. Like for example: spending on looks correlates with enjoying going to large shopping centers, or spending on gadgets correlates to taking care of branded items.
Regarding demographic data and spending habits, there are only some clearly visible correlations when it comes to gender, for example: women would rather enjoy shopping centers or would spend money on their appearance, whereas men would rather spend money on gadgets.
Looking at the data above I found it interesting to see maybe not that obvious correlations. That is why I decided to compare two more subsets: phobias and health habits.
Second one: Could health habits, such as smoking or alcohol drinking potentially correlate with the fear of spiders, snakes and many more?
Here we have our data combination that show us answers of 1009 respondents in two categories. And once again 1 - means ‘not afraid’ or ‘never’ and 5 - is ‘afraid a lot’ or ‘always’.
And here we have a correlation matrix:
Once again we see strong correlation within each category. Generally, as soon as someone is afraid of one thing, it’s likely that other fears exist — take a look at connection between fear of heights, spiders, snakes, rats and… darkness. On the other hand, if someone drinks more than often, it’s more likely that he/she smokes as well. There are no obvious correlations between health habits and fears. The strongest correlation is between rats and snakes, what additionally shows diagram below, stressing that the more someone is afraid of the snakes, the more also of the rats. Between alcohol and smoking the pattern is similar.
In this article I took a closer look at some interesting aspects, while analyzing young people survey, which are:
* movie preferences
* correlation between demographic data and spending habits
* correlation between healthy habits and fears
And I found out which movie types were liked or disliked by each gender and whether we enjoy watching movies at all and where are differences between us. I also figured out that only correlation between demographics and spending habits is in the gender type and that in there we can observe some patterns. And finally I saw some clear connections between smoking and drinking habits, as well as the pattern that if someone has one phobia then probably there are some other phobias too.
But what I really found out, is how amazing data analysis is. This is my first time doing coding in this field and first time writing blog about it. I hope, my journey with data science will not end here and be an important part of my future!