Predicting Wine Quality with Machine Learning

Summary

This project explores how machine learning can be applied to predict the quality of wine based on various physicochemical properties. Using the UCI Wine Quality dataset, I trained and evaluated two models i.e. Decision Tree and Random Forest, to determine which performs better in predicting wine quality ratings. Additionally, we also determined important features influencing wine quality.

Workflow

Data Exploration and Preprocessing – The project begins with loading and exploring the dataset. The data was already clean, minimal preprocessing was needed before modeling.

Model Training: Decision Tree vs. Random Forest – We hypothesize that a Random Forest model, by aggregating the predictions of multiple decision trees, will generalize better and achieve higher accuracy and robustness than a single Decision Tree model. I trained both a Decision Tree and a Random Forest classifier to compare performance in terms of accuracy of the model.

    Model Evaluation – The Random Forest outperformed the Decision Tree, confirming our initial hypothesis.

    Feature Importance Analysis – I then extracted feature importances from the Random Forest model to understand which physicochemical variables had the strongest impact on predicted wine quality. A horizontal bar chart and ranked table were used for visual interpretation.

    Conclusion

    This project demonstrates my ability to build, compare, and interpret machine learning models using real-world datasets and visualize the decision-making process behind model predictions. The Random Forest model achieved higher accuracy than the Decision Tree, supporting our hypothesis that ensemble models can provide better generalization and performance. Key factors influencing wine quality included alcohol content, volatile acidity, and sulphates.

    Python code for Predicting Wine Quality with Machine Learning

    Wine_quality