牛客网SQL进阶128：未完成试卷数大于1的有效用户

官网链接：

未完成试卷数大于1的有效用户_牛客题霸_牛客网现有试卷作答记录表exam_record（uid用户ID, exam_id试卷ID, st。题目来自【牛客题霸】https://www.nowcoder.com/practice/46cb7a33f7204f3ba7f6536d2fc04286?tpId=240&tqId=2183007&ru=%2Fpractice%2F45a87639110841b6950ef6a12d20175f&qru=%2Fta%2Fsql-advanced%2Fquestion-ranking&sourceUrl=

0 问题描述

基于试卷作答记录表exam_record、试卷信息表examination_info ，统计2021年每个未完成试卷作答数大于1的有效用户的数据（有效用户指完成试卷作答数至少为1且未完成数小于5）

1 数据准备

drop table if exists examination_info;
CREATE TABLE examination_info (id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',exam_id int UNIQUE NOT NULL COMMENT '试卷ID',tag varchar(32) COMMENT '类别标签',difficulty varchar(8) COMMENT '难度',duration int NOT NULL COMMENT '时长',release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;drop table if exists exam_record;
CREATE TABLE  exam_record (id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',uid int NOT NULL COMMENT '用户ID',exam_id int NOT NULL COMMENT '试卷ID',start_time datetime NOT NULL COMMENT '开始时间',submit_time datetime COMMENT '提交时间',score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),(9002, 'SQL', 'easy', 60, '2020-02-01 10:00:00'),(9003, '算法', 'medium', 80, '2020-08-02 10:00:00');INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', '2021-07-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-07-02 19:01:01', '2021-07-02 19:30:01', 82),
(1002, 9002, '2021-07-05 18:01:01', '2021-07-05 18:59:02', 90),
(1003, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1005, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1006, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89);

2 数据分析

step1: 获取各用户的tag,start_time及未完成标记和已完成标记，如果该作答记录交卷了则已完成标记为1，未完成标记为0，否则相反：if(submit_time is null, 1, null) as incomplete

代码如下：

select  er.uid,ei.tag,er.start_time,-- incomplete 未提交试卷的标记if(er.submit_time is null, 1, null) as incomplete,-- complete已提交试卷的标记if(er.submit_time is not null, 1,null) as complete
from exam_record er
left join examination_info ei on er.exam_id =ei.exam_id
where year(er.start_time)=2021

step2:

用户分组：group by uid，统计：未完成试卷作答数incomplete_cnt 、已完成试卷作答数complete_cnt_cnt
筛选出有效用户：having complete_cnt >= 1 and incomplete_cnt >1 and incomplete_cnt <5
对于每条作答tag，用符号 ":" 来拼接日期字段和tag字段：concat_ws(':', date(start_time), tag); 对于一个人（组内）的多条作答，需去重distinct concat_ws(':', date(start_time), tag);
group_concat ( 要连接的字段 [separator '分隔符'])

最终代码如下：

SELECT uid,-- 未完成试卷的作答数count(incomplete) as incomplete_cnt,-- 已完成试卷的作答数count(complete) as complete_cnt,-- distinct concat_ws(':', date(start_time), tag)  as cw-- group_concat ( cw  separator ';')group_concat(distinct concat_ws(':', date(start_time), tag) separator ';') as detail
from (SELECT er.uid,ei.tag,er.start_time,-- incomplete 未提交试卷的标记if(er.submit_time is null, 1, null) as incomplete,-- complete已提交试卷的标记if(er.submit_time is not null, 1,null) as completefrom exam_record erleft join examination_info  ei on er.exam_id =ei.exam_idwhere year(er.start_time)=2021
) as t1
group by uid
-- 有效用户：完成试卷的作答数至少为1 and 未完试卷的作答数小于5且大于1
having complete_cnt >= 1 and incomplete_cnt >1 and  incomplete_cnt <5
order by incomplete_cnt desc;

3 小结

本案例涉及到：if条件判断+count，即条件聚合。此外还涉及到date日期函数、concat_ws、group_concat函数的使用

concat_ws(带分隔符的字符串连接函数)

语法：concat_ws(string SEP, string A ,string B.......)
返回值：string
说明：返回输入字符串连接后的结果,SEP表示各个字符串的分隔符
举例：select concat_ws('|','ad','cv','op') ;---> ad|cv|op

group_concat函数

语法：group_concat([distinct] 要连接的字段 [order by 排序字段 asc/desc] [separator '分隔符'])
说明：将group by产生的同一个分组中的值连接起来，返回一个字符串结果
参数解释：distinct排除重复值；如果需要对结果中的值进行排序，可以使用order by子句；separator '分隔符'：是拼接符号，默认为逗号
返回值：string
sql举例：
#--- 对buyer字段进行分组，把去除重复冗余的spending字段的值打印在同一行，'+'加号分隔
select buyer,group_concat(distinct spending separator '+') 
from spend
group by buyer;