UCL葡萄酒(red white wine quality)数据集字段解释、数据导入实战
目录
UCL葡萄酒(red white wine quality)数据集字段解释、数据导入实战
#数据字段说明
#导入数据
#数据字段说明
葡萄酒分为白葡萄酒和红葡萄酒两类。
此处为白葡萄酒:增加一个类型字段则为包含白葡萄和红葡萄的葡萄酒数据集
两个数据集在单独拆分的时候quality的类别可能有轻微差异。
固定酸度:大多数与葡萄酒有关的酸或固定的或不挥发的(不易蒸发)
挥发性酸味:葡萄酒中醋酸的含量过高,会产生令人不快的醋味
柠檬酸:少量的柠檬酸可以增加葡萄酒的新鲜度和风味
残糖:发酵结束后的残糖量,每升1克以下的酒很少,45克以上的酒被认为是甜的
氯化物:酒中盐的含量
游离二氧化硫:SO2以游离形式存在于SO2分子(作为溶解气体)与亚硫酸氢盐离子之间的平衡状态;它可以防止葡萄酒中的微生物生长和氧化
总二氧化硫:SO2游离态和结合态的量;在低浓度的情况下,SO2在葡萄酒中几乎检测不到,但当游离SO2浓度超过50ppm时,SO2在葡萄酒的嗅觉和味觉中就会变得明显
密度:根据酒精和糖含量的百分比,水的密度接近于水的密度
pH值:描述葡萄酒的酸性或碱性程度,从0(非常酸)到14(非常碱性);大多数葡萄酒的pH值在3-4之间
硫酸盐:一种葡萄酒添加剂,可以提高二氧化硫气体(SO2)水平,起到抗菌和抗氧化剂的作用
酒精:葡萄酒中酒精含量的百分比
质量:输出变量(根据感官数据,评分0 - 10),有专门的评酒师和调酒师的职业
White Wine Quality dataset which is a tidy data set. This data set contains 4,898 white wines with 11 variables on quantifying the chemical properties of each wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).
**fixed acidity** : most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
**volatile acidity_** : the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
**citric acid** : found in small quantities, citric acid can add ‘freshness’ and flavor to wines
**residual sugar** : the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
**chlorides** : the amount of salt in the wine
**free sulfur dioxide** : the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisul-fite ion; it prevents microbial growth and the oxidation of wine
**total sulfur dioxide**: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
**density** : the density of water is close to that of water depending on the percent alcohol and sugar con-tent
**pH** : describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3–4 on the pH scale
**sulphates** : a wine additive which can contribute to sulfur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant
**alcohol** : the percent alcohol content of the wine
**quality** : output variable (based on sensory data, score between 0 and 10)
葡萄酒认证包括物理化学测试,例如密度,pH,酒精含量,固定和挥发性酸度的测定等。
可以在UCI机器学习存储库中找到该数据集。 葡萄酒分为白葡萄酒和红葡萄酒两类。
该分析涉及白葡萄酒,并基于数据集中显示的13个变量/特征:
固定酸度
挥发性酸度
柠檬酸
残留糖
氯化物
游离二氧化硫
总二氧化硫
密度
pH值
硫酸盐
酒精
质量
#导入数据
df = pd.read_csv('winequality-white.csv', sep=';')
df.head()
df.tail()
# df.sample(5)
UCL白葡萄酒和红葡萄酒合起来的数据集,增加一个类型字段指示是白葡萄酒还是红葡萄酒
## 'data.frame': 3000 obs. of 15 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## $ type : Factor w/ 2 levels "Red","White": 1 1 1 1 1 1 1 1 1 1 ...
## $ quality.bucket : Factor w/ 3 levels "Low","Medium",..: 2 2 2 2 2 2 2 3 3 2 ...
参考:kaggle+Predicting White Wine Quality
参考:UCL
参考:泰坦尼克号数据集_Kaggle | 泰坦尼克号幸存分析(字段介绍)
参考:Red and White Wine Quality