题目
现有用户登录记录表,已经按照用户日期进行去重处理。以用户登录的最早日期作为新增日期,请计算次日留存率是多少。
样例数据
+----------+-------------+
| user_id | login_date |
+----------+-------------+
| aaa | 2023-12-01 |
| bbb | 2023-12-01 |
| bbb | 2023-12-02 |
| ccc | 2023-12-02 |
| bbb | 2023-12-03 |
| ccc | 2023-12-03 |
| ddd | 2023-12-03 |
| ccc | 2023-12-04 |
| ddd | 2023-12-04 |
+----------+-------------+
样例结果
建表语句
CREATE TABLE t_login_040 (user_id VARCHAR(255) COMMENT '用户ID',login_date VARCHAR(255) COMMENT '登录日期'
) COMMENT '用户登录记录表';
insert into t_login_040(user_id,login_date)
values
('aaa','2023-12-01'),
('bbb','2023-12-01'),
('bbb','2023-12-02'),
('ccc','2023-12-02'),
('bbb','2023-12-03'),
('ccc','2023-12-03'),
('ddd','2023-12-03'),
('ccc','2023-12-04'),
('ddd','2023-12-04');
题解
指标定义:
- 次日留存用户:新增用户第二天登录(活跃)的用户;
- 次日留存率: t + 1 t+1 t+1日留存用户数 / t t t日新增用户;(注意新增可能为0,要先判断)
select
first_day,
-- 当天新增用户数可能为1
concat(if(count(case when date_diff = 0 then user_id end) =0,0,round(count(case when date_diff =1 then user_id end) /count(case when date_diff = 0 then user_id end),1))*100,'%') next_act_per
-- count(case when date_diff = 0 then user_id end) as new_cnt, -- 当天新增用户数
-- count(case when date_diff =1 then user_id end) as next_act_cnt -- 次日登录的用户数
from
(selectuser_id,login_date,
-- 计算出用户的最小登录时间作为新增日期first_day,然后计算当天日期和新增日期的时间差。min(login_date) over(partition by user_id order by login_date) first_day,datediff(login_date,min(login_date)over(partition by user_id order by login_date)) date_diff
from t_login_040)t
group by first_day
order by first_day;