本文介绍pa100k数据集,是从GitHub上paddleCLA工程中提供的路径下载,针对该数据集进行介绍:
01 - 图像信息
- 训练集 - 80000张图像
- 验证集 - 10000张图像
- 测试集 - 10000张图像
02 - 标签信息
· 属性1:有无帽子 - [0]
· 属性2:有无眼镜 - [1]
· 属性3:上衣类型 - [2-7]
· 属性4:下衣类型 - [8-13]
· 属性5:是否有鞋子 - [14]
· 属性6:包的类型 - [15-17]
· 属性7:有否手持物品在前 - [18]
· 属性8:年龄 - [19-21]
· 属性9: 是否为女性 - [22]
· 属性10:人体朝向 - [23-25]
0 = Hat - 帽子:0无1有
1 = Glasses - 眼镜:0无1有
2 = ShortSleeve - 短袖
3 = LongSleeve - 长袖
4 = UpperStride - 有条纹
5 = UpperLogo - 印有logo/图案
6 = UpperPlaid - 撞色衣服(多种颜色)
7 = UpperSplice - 格子衫
8 = LowerStripe - 有条纹
9 = LowerPattern - 印有图像
10 = LongCoat - 长款大衣
11 = Trousers - 长裤
12 = Shorts - 短裤
13 = Skirt&Dress - 裙子/连衣裙
14 = boots - 鞋子
15 = HandBag - 手提包
16 = ShoulderBag - 单肩包
17 = Backpack - 背包
18 = HoldObjectsInFront - 手持物品
19 = AgeOver60 - 大于60
20 = Age18-60 - =18~60
21 = AgeLess18 - 小于18
22 = Female - 0:男性; 1:女性
23 = Front - 人体朝前
24 = Side - 人体朝侧
25 = Back - 人体朝后
针对val图像,根据属性对图像分类:
实现代码如下:
import os
import tqdm
import shutildef mkdir(path):if not os.path.exists(path):os.mkdir(path)def find_indices(lst, value):"""# - 返回所有值 = value的索引"""return [i for i, x in enumerate(lst) if x == value]def pa100k():"""# - 由val_list.txt, 根据属性分文件夹""""01 - 读取标签"labels = ["Hat", "Glasses", "ShortSleeve", "LongSleeve", "UpperStride", "UpperLogo", "UpperPlaid", "UpperSplice", "LowerStripe", "LowerPattern", "LongCoat", "Trousers", "Shorts", "Skirt", "boots", "HandBag", "ShoulderBag", "Backpack", "HoldObjectsInFront", "AgeOver60", "Age18-60", "AgeLess18", "Female","Front", "Side", "Back"]filep = "./val_list.txt"fr = open(filep, 'r')lines = fr.readlines()fr.close()"02 - 保存结果" src_img_p = "./"savep = "./属性分析/"mkdir(savep)for index, attr in enumerate(labels):mkdir(savep + str(index) + '_' + attr)"03 - 数据处理"for line in tqdm.tqdm(lines):list_ = line.split('\t')name = list_[0]label = list_[1].strip('\n').split(',') label_index = find_indices(label, value='1')for index in label_index:save_folder = str(index) + '_' + labels[index]shutil.copyfile(src_img_p + name, savep + save_folder + '/' + name.split('/')[-1])if __name__ == '__main__':pa100k()