女性服装数据分析（电商数据）版本1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()

data = pd.read_csv('Womens_Clothing.csv')
#  查看数据结构
data

	Unnamed: 0	Clothing ID	Age	Title	Review Text	Rating	Recommended IND	Positive Feedback Count	Division Name	Department Name	Class Name
0	0	767	33	NaN	Absolutely wonderful - silky and sexy and comf...	4	1	0	Initmates	Intimate	Intimates
1	1	1080	34	NaN	Love this dress! it's sooo pretty. i happene...	5	1	4	General	Dresses	Dresses
2	2	1077	60	Some major design flaws	I had such high hopes for this dress and reall...	3	0	0	General	Dresses	Dresses
3	3	1049	50	My favorite buy!	I love, love, love this jumpsuit. it's fun, fl...	5	1	0	General Petite	Bottoms	Pants
4	4	847	47	Flattering shirt	This shirt is very flattering to all due to th...	5	1	6	General	Tops	Blouses
...	...	...	...	...	...	...	...	...	...	...	...
23481	23481	1104	34	Great dress for many occasions	I was very happy to snag this dress at such a ...	5	1	0	General Petite	Dresses	Dresses
23482	23482	862	48	Wish it was made of cotton	It reminds me of maternity clothes. soft, stre...	3	1	0	General Petite	Tops	Knits
23483	23483	1104	31	Cute, but see through	This fit well, but the top was very see throug...	3	0	1	General Petite	Dresses	Dresses
23484	23484	1084	28	Very cute dress, perfect for summer parties an...	I bought this dress for a wedding i have this ...	3	1	2	General	Dresses	Dresses
23485	23485	1104	52	Please make more like this one!	This dress in a lovely platinum is feminine an...	5	1	22	General Petite	Dresses	Dresses

23486 rows × 11 columns

有上面结果可知：

该数据集包括23486行和10个特征变量。每行对应一个客户评论，并包含以下变量：

**服装ID：**整数分类变量，指的是要查看的特定作品。
**年龄：**评论者年龄的正整数变量。
**标题：**评论标题的字符串变量。
**评论文本：**评论正文的字符串变量。
**评分：**客户授予的产品评分的正序整数变量，从1最差，到5最佳。
**推荐的IND：**二进制变量，说明客户在推荐1的地方推荐产品，不推荐0的地方。
**积极的反馈计数：**积极的整数，记录发现该评论为积极的其他客户的数量。
**高级部门名称：**产品高级部门的分类名称。
**部门名称：**产品部门名称的分类名称。
**类名称：**产品类名称的分类名称。

中文名称英文名称

服装ID Clothing ID

年龄 Age

标题 Title

评论文本 Review Text

评分： Rating

推荐的IND Recommended IND

积极的反馈计数 Positive Feedback Count

高级部门名称 Division Name

部门名称 Department Name

类名称 Class Name

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23486 entries, 0 to 23485
Data columns (total 11 columns):
Unnamed: 0                 23486 non-null int64
Clothing ID                23486 non-null int64
Age                        23486 non-null int64
Title                      19676 non-null object
Review Text                22641 non-null object
Rating                     23486 non-null int64
Recommended IND            23486 non-null int64
Positive Feedback Count    23486 non-null int64
Division Name              23472 non-null object
Department Name            23472 non-null object
Class Name                 23472 non-null object
dtypes: int64(6), object(5)
memory usage: 2.0+ MB

#  查看缺失值
# data.isnull()
#  删除缺失值
df = data.dropna()
df

	Unnamed: 0	Clothing ID	Age	Title	Review Text	Rating	Recommended IND	Positive Feedback Count	Division Name	Department Name	Class Name
2	2	1077	60	Some major design flaws	I had such high hopes for this dress and reall...	3	0	0	General	Dresses	Dresses
3	3	1049	50	My favorite buy!	I love, love, love this jumpsuit. it's fun, fl...	5	1	0	General Petite	Bottoms	Pants
4	4	847	47	Flattering shirt	This shirt is very flattering to all due to th...	5	1	6	General	Tops	Blouses
5	5	1080	49	Not for the very petite	I love tracy reese dresses, but this one is no...	2	0	4	General	Dresses	Dresses
6	6	858	39	Cagrcoal shimmer fun	I aded this in my basket at hte last mintue to...	5	1	1	General Petite	Tops	Knits
...	...	...	...	...	...	...	...	...	...	...	...
23481	23481	1104	34	Great dress for many occasions	I was very happy to snag this dress at such a ...	5	1	0	General Petite	Dresses	Dresses
23482	23482	862	48	Wish it was made of cotton	It reminds me of maternity clothes. soft, stre...	3	1	0	General Petite	Tops	Knits
23483	23483	1104	31	Cute, but see through	This fit well, but the top was very see throug...	3	0	1	General Petite	Dresses	Dresses
23484	23484	1084	28	Very cute dress, perfect for summer parties an...	I bought this dress for a wedding i have this ...	3	1	2	General	Dresses	Dresses
23485	23485	1104	52	Please make more like this one!	This dress in a lovely platinum is feminine an...	5	1	22	General Petite	Dresses	Dresses

19662 rows × 11 columns

分析

# 1. 可视化 给出评分者的年龄
plt.hist(df['Age'], color=color[1], label='age')
plt.legend()
plt.xlabel('age')
plt.ylabel('count')
plt.title('age of commentator')
print('\n figure 01')

 figure 01

在这里fff图片描述

得出结论

由figure01 可得出：给出评论的人的年龄大多在25到45之间，青年、中年人较多

# 2. 可视化不同年龄的等级图
plt.figure(figsize=(10, 8))
sns.boxplot(x='Rating', y='Age', data=df)
plt.title('age of rating')
print('\n figure 02')

 figure 02

在这里插入图片描述

得出结论

由figure02 可得出：给出评分分布的年龄都差不多

3、每个部门、推荐什么服装？
查看Division Name,Department Name和’Class Name的唯一值

print('高级部门Division Name', df['Division Name'].unique())
print()
print('部门Department Name',df['Department Name'].unique())
print()
print('类名称Class Name',df['Class Name'].unique())

高级部门Division Name ['General' 'General Petite' 'Initmates']部门Department Name ['Dresses' 'Bottoms' 'Tops' 'Intimate' 'Jackets' 'Trend']类名称Class Name ['Dresses' 'Pants' 'Blouses' 'Knits' 'Intimates' 'Outerwear' 'Lounge''Sweaters' 'Skirts' 'Fine gauge' 'Sleep' 'Jackets' 'Swim' 'Trend' 'Jeans''Shorts' 'Legwear' 'Layering' 'Casual bottoms' 'Chemises']

将Recommended IND推荐产品为1，不推荐0的数据分开

# recommend  not_recommend
recommend = df[df['Recommended IND'] == 1]
not_recommend = df[df['Recommended IND'] == 0]
# recommend.head()
not_recommend.head()

	Unnamed: 0	Clothing ID	Age	Title	Review Text	Rating	Positive Feedback Count	Division Name	Department Name	Class Name
2	2	1077	60	Some major design flaws	I had such high hopes for this dress and reall...	3	0	General	Dresses	Dresses
5	5	1080	49	Not for the very petite	I love tracy reese dresses, but this one is no...	2	4	General	Dresses	Dresses
10	10	1077	53	Dress looks like it's made of cheap material	Dress runs small esp where the zipper area run...	3	14	General	Dresses	Dresses
22	22	1077	31	Not what it looks like	First of all, this is not pullover styling. th...	2	7	General	Dresses	Dresses
25	25	697	31	Falls flat	Loved the material, but i didnt really look at...	3	0	Initmates	Intimate	Lounge

# 4.可视化不同部门的推荐和不推荐的叠加柱状图
plt.figure(figsize=(12,8))
plt.hist(recommend['Department Name'], color=color[2], alpha=0.5, label='recommend')
plt.hist(not_recommend['Department Name'], color=color[4], alpha=0.5, label='not_recommend')
plt.legend()
plt.xticks(rotation=45)
plt.title('Department recommend and not_recommend')
print('\n figure 03')

 figure 03

在这里插入图片描述

得出结论

由figure03可知绿色的面积大于X色的面积，由此说明，大部分部门都可以推荐商品

# 可视化不同商品的推荐和不推荐叠加柱状图
plt.figure(figsize=(12,8))
plt.hist(recommend['Class Name'], color=color[1], alpha=0.5, label='recommend')
plt.hist(not_recommend['Class Name'], color=color[5], alpha=0.5, label='not_recommend')
plt.legend()
plt.xticks(rotation=45)
plt.title('Class recommend and not_recommend')
print('\n figure 04')

 figure 04

在这里插入图片描述

得出结论

从figure04看出：并不是卖最多的Knits商品推荐成功率最大

# 哪个年龄段的人对什么样的衣服发表什么样的评论
df['Review Length'] = df['Review Text'].astype(str).apply(len)
df

E:\anaconda\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

	Unnamed: 0	Clothing ID	Age	Title	Review Text	Rating	Recommended IND	Positive Feedback Count	Division Name	Department Name	Class Name	Review Length
2	2	1077	60	Some major design flaws	I had such high hopes for this dress and reall...	3	0	0	General	Dresses	Dresses	500
3	3	1049	50	My favorite buy!	I love, love, love this jumpsuit. it's fun, fl...	5	1	0	General Petite	Bottoms	Pants	124
4	4	847	47	Flattering shirt	This shirt is very flattering to all due to th...	5	1	6	General	Tops	Blouses	192
5	5	1080	49	Not for the very petite	I love tracy reese dresses, but this one is no...	2	0	4	General	Dresses	Dresses	488
6	6	858	39	Cagrcoal shimmer fun	I aded this in my basket at hte last mintue to...	5	1	1	General Petite	Tops	Knits	496
...	...	...	...	...	...	...	...	...	...	...	...	...
23481	23481	1104	34	Great dress for many occasions	I was very happy to snag this dress at such a ...	5	1	0	General Petite	Dresses	Dresses	131
23482	23482	862	48	Wish it was made of cotton	It reminds me of maternity clothes. soft, stre...	3	1	0	General Petite	Tops	Knits	223
23483	23483	1104	31	Cute, but see through	This fit well, but the top was very see throug...	3	0	1	General Petite	Dresses	Dresses	208
23484	23484	1084	28	Very cute dress, perfect for summer parties an...	I bought this dress for a wedding i have this ...	3	1	2	General	Dresses	Dresses	427
23485	23485	1104	52	Please make more like this one!	This dress in a lovely platinum is feminine an...	5	1	22	General Petite	Dresses	Dresses	110

19662 rows × 12 columns

#  绘制单Review Length变量分布
# 单变量分布的最方便的方法是sns.distplot()功能。默认情况下，这将绘制直方图并拟合核密度估计（KDE）
fig = plt.figure(figsize=(12, 8))
ax = sns.distplot(df['Review Length'], color=color[3])
ax = plt.title("Length of Reviews")
print('\n figure 05')

 figure 05

在这里插入图片描述

得出结论

由figure05可得出大部分人评论的长度都基本在500

#  可视化不同年龄段的评论长度分布
plt.figure(figsize=(18,8))
sns.boxplot(x='Age', y='Review Length', data=df)
print('\n figure 06')

 figure 06

在这里插入图片描述

# 评分与正面反馈计数
plt.figure(figsize=(12,8))
sns.boxplot(x = 'Rating', y = 'Positive Feedback Count', data = df)
print('\n figure 07')

 figure 07

在这里插入图片描述

得出结论

由图figure07可得出评分在3以上的正面反馈的计数大

词云评论可视化

# 1. 数据清洗
import re
from wordcloud import WordCloud, STOPWORDSdef clean_data(text):letters_only = re.sub("[^a-zA-Z]", " ", text) #  替换标点符合等words = letters_only.lower().split()                            return( " ".join( words ))
#     return letters_onlystopwords= set(STOPWORDS)|{'skirt', 'blouse','dress','sweater', 'shirt','bottom', 'pant', 'pants' 'jean', 'jeans','jacket', 'top', 'dresse'}def create_cloud(rating):x= [i for i in rating]y= ' '.join(x)cloud = WordCloud(background_color='white',width=1600, height=800,max_words=100,stopwords= stopwords).generate(y)plt.figure(figsize=(15,7.5))plt.axis('off')plt.imshow(cloud)plt.show()

#  等级是5的词云图
rating5= df[df['Rating']==5]['Review Text'].apply(clean_data)
create_cloud(rating5)

在这里插入图片描述

#  等级是4的词云图
rating4= df[df['Rating']==4]['Review Text'].apply(clean_data)
create_cloud(rating4)

在这里插入图片描述

#  等级是3的词云图
rating3= df[df['Rating']==3]['Review Text'].apply(clean_data)
create_cloud(rating3)

在这里插入图片描述

#  等级是2的词云图
rating2= df[df['Rating']==2]['Review Text'].apply(clean_data)
create_cloud(rating2)

在这里插入图片描述

#  等级是1的词云图
rating1= df[df['Rating']==1]['Review Text'].apply(clean_data)
create_cloud(rating1)

在这里插入图片描述

女性服装数据分析（电商数据）版本1

女性服装数据分析（电商数据）版本1

分析

得出结论

得出结论

得出结论

得出结论

得出结论

得出结论

词云评论可视化

相关文章

2021年中国服装行业分析报告-产业规模现状与发展规划趋势

2020年中国服装行业发展现状分析，消费理念的改变促使行业转型「图」

2020年中国服装行业数据中台研究报告

怎么找服装行业客户找服装客户的方法

服装行业2023开年现状速递/服装行业的风险及应对方式/有这些特征的服装企业更容易翻身

算法岗和开发岗有什么区别？

2023五一数学建模竞赛选题人数公布

为什么地球的生物都是碳基生命？科学家：大自然环境选择的结果

【深度学习之美笔记】人工“碳”索意犹尽，智能“硅”来未可知（入门系列之二）

华为鸿蒙碳基芯片,华为转投第三大CPU架构RISC-V？首款鸿蒙开发板曝光

二进制基础

生命，到底是什么？

华为云首席产品官方国伟：没有人拥有看到未来的水晶球，云上突围之路如何走？

碳云智能想做的，是规划生命路线

计算机在生物学研究领域的认识,数字生命

碳基计算机电路,革命性的计算机技术：金属碳电路元件可在更快，更高效的碳基晶体管上工作...

碳基计算机电路,碳基电子学研究中心张志勇-彭练矛课题组在碳基逻辑集成电路领域取得重要进展...

华为鸿蒙碳基芯片,华为全球扫货应对危机，与北大联合研制碳基芯片，能否不用光刻机...

论文降重攻略

有哪些论文降重的方法?