Skip to Content

I want to start this piece off by admitting something important: I know absolutely nothing about wine.

As a guy in his late twenties, I’ve spent the better part of the last five years sniffing and swirling my way through dates, social occasions and business outings without ever really knowing what the heck I was smelling or looking for in the glass of wine I was holding.

“Should we go with the 2009 Malbec?” my date would say. “Absolutely, big fan of the Malbec, and 2009 was a great year in France” I would answer.  As far as I was concerned, every year was a great year for wine, and no matter what the person would suggest, I was a huge fan!  I could narrow it down to the fact that I liked red and wasn’t a huge fan of white, but to talk intelligently beyond that was deep water for me.

That’s when I turned to Predictive Analytics. I came across a study:

“Modeling wine preferences by data mining from physicochemical properties” in which they took a number of samples of wine and had wine experts rank them from a scale of 1 (being the worst) and 10 (being the highest). Along with the quality rating they listed out a number of other attributes of each wine sample like fixed acidity, alcohol level and sulphates.

Other than alcohol level these terms were new to me, but by working with Predictive Analytics we could manipulate the data to see which attribute most strongly correlated with the quality grade. In other words, if a wine has a high density does that necessarily mean that it is a really good wine? By using advanced algorithms we could figure out the exact correlation of each of these attributes and find out what truly makes a good wine.

First I broke each of the attributes down into basic scatter plots. The scatter plot chart takes every score of the wines and represents them as a dot across the whole chart. By putting the dots all together we can start to see larger trends. By relating each of the attributes back to the quality score we can get a quick picture of which attributes impact the quality of the wine.

The chart makes it visibly obvious which trends are occurring in terms of the wine. We can see in the chart below (which maps Alcohol level along the Y axis and Quality along the X axis) that the Quality of the wine trends slightly upwards as we increase the Alcohol content. In the second graph (which maps Volatile Acidity along the Y axis and Quality along the X axis) we see the opposite, with the quality of the wine seeming to trend downwards as the volatile acidity of the wine increases.

alcohol vs quality of wine

Alcohol vs. Quality


volatile acidity vs quality of wine

Volatile Acidity vs. Quality

So what does this mean? For starters, it means that generally the more alcohol that is in a wine, the better that wine actually is. (If this was the only measuring stick I would have been a wine connoisseur years ago.)

But another important trend is the volatile acidity, the higher a certain wine’s volatile acidity is the worse the quality of wine is. The volatile acidity of a wine refers to the level of acetic acids that is present in the wine and it’s what leads to that “vinegar” like taste if the level is too high. (I obviously looked this up because as we know I have not the slightest clue about wine)

While we have been able to identify maybe the two most obvious attributes that separate the good wines from the bad, we can take this analysis much further. Predictive Analysis allows us to do a number of different advanced calculations like clustering, decision trees and outliers. But for this specific data we can use a Regression Algorithm to analyze the attribute data and ultimately come up with a pattern for the quality of the wine.

By using a Multiple Linear Regression, Predictive Analytics looks at each of the wines and finds the correlation between each of its attributes and how that affects the ultimate score of the wine. Essentially, it does what we did with the alcohol level and volatile acidity but with each of the attributes in a much more exact manner. The end result is a predicted value of what each of the wines scores should be and in some cases, the predicted value is very close.

Why is this important? Well if we are able to come up with a fairly close match of the predicted values to the actual scores, it essentially becomes our grading machine and can place a grade level on any wine without even tasting it as long as we know all of the given attributes of a wine. We no longer need the so called “wine experts” to taste, sniff and swirl their wine around, we can just put the numbers into the computer and see if the wine is a 2 or an 8.

Of course this is simplifying the art of wine tasting just a bit, but the bigger picture here is that companies are using these exact same techniques to predict much more important things like customer behavior, inventory levels and financial trends. Imagine being able to do the same kind of prediction that we made with wine, and use this to predict what exactly certain customers will purchase.

In truth, this is already being done; Predictive Analytics is opening up all news doors for companies that want to understand their business processes and is changing the business analytics game. Just a few years ago, companies were focused much more on what had happened, or for the more advanced users, what was currently happening. But now mature business intelligence companies that have mastered the aforementioned trends are starting to look at the bigger picture and how they can truly optimize their business for the future.

So what did I gain from my adventure into the world of wine analytics? Well I still couldn’t tell you whether 2008, 2009 or 2010 were necessarily “good years.” But you can be sure the next time I’m out to dinner I’ll be the guy saying things like “Oh wow the volatile acidity on this is a bit high, I am not a real fan of this 2009 Malbec”.

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. Former Member

    Hey Drew,

    Great post.  What I found quite interesting also was this piece of information from your data link:

    Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    I’m sure if the data was more extensive and included the above variables, it would be very revealing and might make more than a few wineries unhappy! 🙂

    Cheers, Josh

    (0) 
    1. Former Member Post author

      Thanks Josh!

      I totally agree, would be very interesting to see what in fact we are paying for and what is truely our best buy for our buck.

      Cheers, Drew

      (0) 

Leave a Reply