pandas与sql对应关系【帮助sql使用者快速上手pandas】

本页旨在提供一些如何使用pandas执行各种SQL操作的示例,来帮助SQL使用者快速上手使用pandas。

目录

    • SQL语法
      • 一、选择SELECT
        • 1、选择
        • 2、添加计算列
      • 二、连接JOIN ON
        • 1、内连接
        • 2、左外连接
        • 3、右外连接
        • 4、全外连接
      • 三、过滤WHERE
        • 1、AND
        • 2、OR
        • 3、IS NULL
        • 4、IS NOT NULL
        • 5、BETWEEN
        • 6、LIKE
        • 7、CASE WHEN
      • 四、分组GROUP BY
        • 1、count()
        • 2、avg()
        • 3、sum()、max()、min()
      • 五、HAVING
      • 六、排序ORDER BY
      • 七、LIMIT/OFFSET
        • 1、LIMIT
        • 2、指定列中最大的前N行
        • 3、OFFSET
      • 八、UNION ALL/UNION
        • 1、UNION ALL
        • 2、UNION
      • 九、开窗函数
        • 1、ROW_NUMBER()
        • 2、RANK()
        • 3、SUM()

SQL语法

  • SELECT [DISTINCT | ALL] column1, column2, …, aggregate_function(columnN), …
  • FROM
  • table_name [AS alias]
  • [JOIN type JOIN table2_name [AS alias2] ON join_condition]
  • [, JOIN type JOIN table3_name [AS alias3] ON join_condition, …]
  • [WHERE condition]
  • [GROUP BY column1, column2, …]
  • [HAVING condition]
  • [ORDER BY column1 [ASC | DESC], column2 [ASC | DESC], …]
  • [LIMIT number [OFFSET offset]]
  • [UNION [ALL] SELECT …] – 可以链式添加多个UNION SELECT语句
  1. DISTINCT:确保结果集中的行是唯一的。ALL(默认)表示返回所有匹配的行,包括重复的行。
  2. aggregate_function():聚合函数,如**SUM(), AVG(), COUNT(), MAX(), MIN()**等,用于对一组值执行计算并返回单个值。
  3. JOIN type:指定连接类型,如INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN等。ON join_condition:定义连接条件。
  4. WHERE condition:过滤结果集中的行,只返回满足条件的行。
  5. GROUP BY:将结果集按一个或多个列分组。通常与聚合函数一起使用。
  6. HAVING condition:过滤分组后的结果集,只返回满足条件的组。
  7. ORDER BY:对结果集进行排序。可以指定多个列和排序方向(ASC升序[默认]或DESC降序)。
  8. LIMIT number [OFFSET offset]:限制返回的行数,并可选地指定跳过的行数。
  9. UNION [ALL]:合并两个或多个SELECT语句的结果集。UNION默认去除重复行,而UNION ALL保留所有行。

一、选择SELECT

在SQL中,选择是使用要选择的列的逗号分隔列表(或* 选择所有列)

1、选择

SQL语法:

SELECT total_bill, tip, smoker, time
FROM data;

对应pandas实现:

In :data[["total_bill", "tip", "smoker", "time"]]
Out :
total_bill	tip	smoker	time
0	16.99	1.01	No	Dinner
1	10.34	1.66	No	Dinner
2	21.01	3.50	No	Dinner
3	23.68	3.31	No	Dinner
4	24.59	3.61	No	Dinner
...	...	...	...	...
239	29.03	5.92	No	Dinner
240	27.18	2.00	Yes	Dinner
241	22.67	2.00	Yes	Dinner
242	17.82	1.75	No	Dinner
243	18.78	3.00	No	Dinner
2、添加计算列

SQL语法:

SELECT *, tip/total_bill as tip_rate
FROM data;

对应pandas实现:

1)可以使用DataFrame的DataFrame.assign()方法来追加新列

In :data = data.assign(tip_rate=data["tip"] / data["total_bill"])
In :dataOut :
total_bill	tip	sex	smoker	day	time	size	tip_rate
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808
...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744

2)也可以直接计算

In :data['tip_rate2'] = data["tip"] / data["total_bill"]
In :dataOut :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744

二、连接JOIN ON

构造测试数据

In :df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
In :df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})
1、内连接

SQL语法:

SELECT *
FROM df1
INNER JOIN df2ON df1.key = df2.key; 

对应pandas实现:

In :pd.merge(df1, df2, on="key")
Out :	
key	value_x	value_y
0	B	0.227232	1.011278
1	D	1.415853	-0.149207
2	D	1.415853	-0.608430
2、左外连接

SQL语法:

SELECT *
FROM df1
LEFT OUTER JOIN df2ON df1.key = df2.key;

对应pandas实现:

In :pd.merge(df1, df2, on="key", how="left")
Out :	
key	value_x	value_y
0	A	1.418532	NaN
1	B	0.227232	1.011278
2	C	-0.578408	NaN
3	D	1.415853	-0.149207
4	D	1.415853	-0.608430
3、右外连接

SQL语法:

SELECT *
FROM df1
RIGHT OUTER JOIN df2ON df1.key = df2.key;

对应pandas实现:

In :pd.merge(df1, df2, on="key", how="right")
Out :
key	value_x	value_y
0	B	0.227232	1.011278
1	D	1.415853	-0.149207
2	D	1.415853	-0.608430
3	E	NaN	1.437388
4、全外连接

SQL语法:

SELECT *
FROM df1
FULL OUTER JOIN df2ON df1.key = df2.key;

对应pandas实现:

In :pd.merge(df1, df2, on="key", how="outer")
Out :key	value_x	value_y
0	A	1.418532	NaN
1	B	0.227232	1.011278
2	C	-0.578408	NaN
3	D	1.415853	-0.149207
4	D	1.415853	-0.608430
5	E	NaN	1.437388

三、过滤WHERE

SQL中的过滤是通过WHERE子句完成的。

SQL语法:

SELECT *
FROM data
WHERE total_bill >10;

对应pandas实现:

In :data[data["total_bill"] > 10]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744
1、AND

对应pandas中的&

SQL语法:

# 查询晚餐小费超过5美元的数据
SELECT *
FROM data
WHERE time = 'Dinner' AND tip > 5.00;

对应pandas实现:

In :data[(data["time"] == "Dinner") & (data["tip"] > 5.00)]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
44	30.40	5.60	Male	No	Sun	Dinner	4	0.184211	0.184211
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
52	34.81	5.20	Female	No	Sun	Dinner	4	0.149382	0.149382
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
116	29.93	5.07	Male	No	Sun	Dinner	4	0.169395	0.169395
155	29.85	5.14	Female	No	Sun	Dinner	5	0.172194	0.172194
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812
172	7.25	5.15	Male	Yes	Sun	Dinner	2	0.710345	0.710345
181	23.33	5.65	Male	Yes	Sun	Dinner	2	0.242177	0.242177
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
211	25.89	5.16	Male	Yes	Sat	Dinner	4	0.199305	0.199305
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
2、OR

对应pandas中的|

SQL语法:

# 查询至少5名用餐者的小费或账单总额超过45美元的数据
SELECT *
FROM data
WHERE size >= 5 OR total_bill > 45;

对应pandas实现:

In :data[(data["size"] >= 5) | (data["total_bill"] > 45)]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
125	29.80	4.20	Female	No	Thur	Lunch	6	0.140940	0.140940
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389
143	27.05	5.00	Female	No	Thur	Lunch	6	0.184843	0.184843
155	29.85	5.14	Female	No	Sun	Dinner	5	0.172194	0.172194
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812
182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178	0.077178
185	20.69	5.00	Male	No	Sun	Dinner	5	0.241663	0.241663
187	30.46	2.00	Male	Yes	Sun	Dinner	5	0.065660	0.065660
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
216	28.15	3.00	Male	Yes	Sat	Dinner	5	0.106572	0.106572
3、IS NULL

构造测试数据

In :frame = pd.DataFrame({"col1": ["A", "B", np.nan, "C", "D"], "col2": ["F", np.nan, "G", "H", "I"]}
)

SQL语法:

SELECT *
FROM frame
WHERE col2 IS NULL;

对应pandas实现:

In :frame[frame["col2"].isna()]
Out :
col1	col2
1	B	NaN
4、IS NOT NULL

SQL语法:

SELECT *
FROM frame
WHERE col1 IS NOT NULL;

对应pandas实现:

In :frame[frame["col1"].notna()]
Out :
col1	col2
0	A	F
1	B	NaN
3	C	H
4	D	I
5、BETWEEN

SQL语法:

SELECT *
FROM data
WHERE tip between 5 and 7;

对应pandas实现:

In :data[data['tip'].between(5, 7)]
Out :total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
11	35.26	5.00	Female	No	Sun	Dinner	4	0.141804	0.141804
39	31.27	5.00	Male	No	Sat	Dinner	3	0.159898	0.159898
44	30.40	5.60	Male	No	Sun	Dinner	4	0.184211	0.184211
46	22.23	5.00	Male	No	Sun	Dinner	2	0.224921	0.224921
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
52	34.81	5.20	Female	No	Sun	Dinner	4	0.149382	0.149382
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
73	25.28	5.00	Female	Yes	Sat	Dinner	2	0.197785	0.197785
83	32.68	5.00	Male	Yes	Thur	Lunch	2	0.152999	0.152999
85	34.83	5.17	Female	No	Thur	Lunch	4	0.148435	0.148435
88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746	0.236746
116	29.93	5.07	Male	No	Sun	Dinner	4	0.169395	0.169395
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389
143	27.05	5.00	Female	No	Thur	Lunch	6	0.184843	0.184843
155	29.85	5.14	Female	No	Sun	Dinner	5	0.172194	0.172194
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799
172	7.25	5.15	Male	Yes	Sun	Dinner	2	0.710345	0.710345
181	23.33	5.65	Male	Yes	Sun	Dinner	2	0.242177	0.242177
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
185	20.69	5.00	Male	No	Sun	Dinner	5	0.241663	0.241663
197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982	0.115982
211	25.89	5.16	Male	Yes	Sat	Dinner	4	0.199305	0.199305
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
6、LIKE

开头/结尾字符匹配可以用startswith()/endswith()函数实现

SQL语法:

SELECT *
FROM data
WHERE time like 'Di%';

对应pandas实现:

In :data[data['time'].str.startswith('Di')]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744

中间字符匹配可以用contains()函数实现,na参数设置为False表示在缺失值上不返回True,case参数设置为False表示不区分大小写匹配

SQL语法:

SELECT *
FROM data
WHERE time like '%inne%';

对应pandas实现:

In :data[data['time'].str.contains('inne', na=False, case=False)]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744
7、CASE WHEN

SQL语法:

SELECT tip,case when tip<2 then 'LOW'when 2<=tip<=3 then 'MID'when 3<tip then 'HIG'end flag
FROM data;

对应pandas实现:

In :data['flag'] = data['tip'].apply(lambda x: 'LOW' if x < 2 else ('MID' if 2 <= x <= 3 else 'HIG'))
In :data[['tip', 'flag']]
Out :tip	flag
0	1.01	LOW
1	1.66	LOW
2	3.50	HIG
3	3.31	HIG
4	3.61	HIG
...	...	...
239	5.92	HIG
240	2.00	MID
241	2.00	MID
242	1.75	LOW
243	3.00	MID

四、分组GROUP BY

在pandas中,SQL的GROUP BY操作是使用类似名称的 groupby()方法。配合aggregate_function()使用

1、count()

SQL语法:

SELECT sex, count(*)
FROM data
GROUP BY sex;

对应pandas实现:

In :data.groupby("sex").size()
Out :
sex
Female     87
Male      157
dtype: int64
2、avg()

SQL语法:

SELECT day, AVG(tip), COUNT(*)
FROM tips
GROUP BY day;

对应pandas实现:

In :data.groupby("day").agg({"tip": "mean", "day": "size"})
Out :
tip	day
day		
Fri	2.734737	19
Sat	2.993103	87
Sun	3.255132	76
Thur	2.771452	62
3、sum()、max()、min()

SQL语法:

SELECT day, AVG(tip), SUM(tip), MAX(tip), MIN(tip), COUNT(tip)
FROM data
GROUP BY day;

对应pandas实现:

In :data.groupby("day").agg({"tip": ["mean", "sum", "max", "min"],"day": "size"
}).reset_index()
Out :
day	tip	day
mean	sum	max	min	size
0	Fri	2.734737	51.96	4.73	1.00	19
1	Sat	2.993103	260.40	10.00	1.00	87
2	Sun	3.255132	247.39	6.50	1.01	76
3	Thur	2.771452	171.83	6.70	1.25	62

五、HAVING

SQL语法:

SELECT day, AVG(tip), SUM(tip), MAX(tip), MIN(tip), COUNT(*)
FROM data
GROUP BY day
HAVING SUM(tip) > 200;

对应pandas实现:

In :result = data.groupby("day").agg({"tip": ["mean", "sum", "max", "min"],"day": "size"
}).reset_index()
In :result.columns = ['day', 'avg_tip', 'sum_tip', 'max_tip', 'min_tip', 'count_tips']
In :result[result['sum_tip'] > 200].reset_index()
Out :index	day	avg_tip	sum_tip	max_tip	min_tip	count_tips
0	1	Sat	2.993103	260.40	10.0	1.00	87
1	2	Sun	3.255132	247.39	6.5	1.01	76

六、排序ORDER BY

SQL语法:

SELECT *
FROM data
ORDER BY tip;

对应pandas实现:

In :data.sort_values("tip")
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733	0.325733
236	12.60	1.00	Male	Yes	Sat	Dinner	2	0.079365	0.079365
92	5.75	1.00	Female	Yes	Fri	Dinner	2	0.173913	0.173913
111	7.25	1.00	Female	No	Sat	Dinner	1	0.137931	0.137931
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
...	...	...	...	...	...	...	...	...	...
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812

SQL语法:

SELECT *
FROM data
ORDER BY tip,total_bill;

对应pandas实现:

In :data.sort_values(["tip","total_bill"])
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733	0.325733
92	5.75	1.00	Female	Yes	Fri	Dinner	2	0.173913	0.173913
111	7.25	1.00	Female	No	Sat	Dinner	1	0.137931	0.137931
236	12.60	1.00	Male	Yes	Sat	Dinner	2	0.079365	0.079365
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
...	...	...	...	...	...	...	...	...	...
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812

SQL语法:

SELECT *
FROM data
ORDER BY tip asc,total_bill desc;

对应pandas实现:

In :data.sort_values(by=["tip", "total_bill"], ascending=[True, False])
Out :total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
236	12.60	1.00	Male	Yes	Sat	Dinner	2	0.079365	0.079365
111	7.25	1.00	Female	No	Sat	Dinner	1	0.137931	0.137931
92	5.75	1.00	Female	Yes	Fri	Dinner	2	0.173913	0.173913
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733	0.325733
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
...	...	...	...	...	...	...	...	...	...
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812

七、LIMIT/OFFSET

1、LIMIT

在pandas中使用head()实现

SQL语法:

SELECT * 
FROM data
LIMIT 10;

对应pandas实现:

In :data.head(10)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
5	25.29	4.71	Male	No	Sun	Dinner	4	0.186240	0.186240
6	8.77	2.00	Male	No	Sun	Dinner	2	0.228050	0.228050
7	26.88	3.12	Male	No	Sun	Dinner	4	0.116071	0.116071
8	15.04	1.96	Male	No	Sun	Dinner	2	0.130319	0.130319
9	14.78	3.23	Male	No	Sun	Dinner	2	0.218539	0.218539
2、指定列中最大的前N行

SQL语法:

SELECT * 
FROM data
ORDER BY tip DESC
LIMIT 10;

对应pandas实现:

In :data.nlargest(10, columns="tip")
或
In :data.sort_values(by="tip", ascending=False).head(10)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746	0.236746
3、OFFSET

跳过排序后的前5行,选出接下来的10行

SQL语法:

SELECT * FROM tips
ORDER BY tip DESC
LIMIT 10 OFFSET 5;

对应pandas实现:

In :data.sort_values(by="tip", ascending=False).iloc[5:15]
Out :	
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746	0.236746
181	23.33	5.65	Male	Yes	Sun	Dinner	2	0.242177	0.242177
44	30.40	5.60	Male	No	Sun	Dinner	4	0.184211	0.184211
52	34.81	5.20	Female	No	Sun	Dinner	4	0.149382	0.149382
85	34.83	5.17	Female	No	Thur	Lunch	4	0.148435	0.148435
211	25.89	5.16	Male	Yes	Sat	Dinner	4	0.199305	0.199305

八、UNION ALL/UNION

pandas中使用concat()函数实现

构造测试数据

In :df1 = pd.DataFrame({"city": ["Chicago", "San Francisco", "New York City"], "rank": range(1, 4)}
)
In :df2 = pd.DataFrame({"city": ["Chicago", "Boston", "Los Angeles"], "rank": [1, 4, 5]}
)
1、UNION ALL

SQL语法:

SELECT city, rank
FROM df1
UNION ALL
SELECT city, rank
FROM df2;

对应pandas实现:

In :pd.concat([df1, df2])
Out :
city	rank
0	Chicago	1
1	San Francisco	2
2	New York City	3
0	Chicago	1
1	Boston	4
2	Los Angeles	5
2、UNION

SQL语法:

SELECT city, rank
FROM df1
UNION
SELECT city, rank
FROM df2;

对应pandas实现:

In :pd.concat([df1, df2]).drop_duplicates()
Out :city	rank
0	Chicago	1
1	San Francisco	2
2	New York City	3
1	Boston	4
2	Los Angeles	5

九、开窗函数

1、ROW_NUMBER()

为结果集中的每一行分配一个唯一的数字,顺序为1,2,3,4,5……

SQL语法:

查询每天total_bill最大的两行数据

SELECT * FROM (SELECTt.*,ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rnFROM data t
)
WHERE rn < 3
ORDER BY day, rn;

对应pandas实现:

In :(data.assign(rn=data.sort_values(["total_bill"], ascending=False).groupby(["day"]).cumcount()+ 1).query("rn < 3").sort_values(["day", "rn"])
)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2	rn
95	40.17	4.73	Male	Yes	Fri	Dinner	4	0.117750	0.117750	1
90	28.97	3.00	Male	Yes	Fri	Dinner	2	0.103555	0.103555	2
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812	1
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220	2
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799	1
182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178	0.077178	2
197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982	0.115982	1
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389	2
2、RANK()

为结果集中的每一行分配一个排名,相同的值会获得相同的排名,但会跳过之后的排名,顺序为1,2,2,4,5,5,5,8……

SQL语法:

查询每天total_bill最大的两行数据

SELECT * FROM (SELECTt.*,RANK() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rnFROM data t
)
WHERE rn < 3
ORDER BY day, rn;

对应pandas实现:

In :(data.assign(rnk=data.groupby(["day"])["total_bill"].rank(method="first", ascending=False)).query("rnk < 3").sort_values(["day", "rnk"])
)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2	rnk
95	40.17	4.73	Male	Yes	Fri	Dinner	4	0.117750	0.117750	1.0
90	28.97	3.00	Male	Yes	Fri	Dinner	2	0.103555	0.103555	2.0
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812	1.0
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220	2.0
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799	1.0
182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178	0.077178	2.0
197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982	0.115982	1.0
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389	2.0
3、SUM()

SQL语法:

SELECTt.*,SUM() OVER(PARTITION BY day) AS snFROM data t;
In :data['sn'] = data.groupby('day')['total_bill'].cumsum()
In :data
Out :total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2	sn
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447	16.99
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542	27.33
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587	48.34
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780	72.02
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808	96.61
...	...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927	1710.73
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584	1737.91
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222	1760.58
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204	1778.40
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744	1096.33

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/1371.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【学习路线】Python自动化运维 详细知识点学习路径(附学习资源)

学习本路线内容之前&#xff0c;请先学习Python的基础知识 其他路线&#xff1a; Python基础 >> Python进阶 >> Python爬虫 >> Python数据分析&#xff08;数据科学&#xff09; >> Python 算法&#xff08;人工智能&#xff09; >> Pyth…

Nginx代理同域名前后端分离项目的完整步骤

前后端分离项目&#xff0c;前后端共用一个域名。通过域名后的 url 前缀来区别前后端项目。 以 vue php 项目为例。直接上 server 模块的 nginx 配置。 server{ listen 80; #listen [::]:80 default_server ipv6onlyon; server_name demo.com;#二配置项目域名 index index.ht…

73.矩阵置零 python

矩阵置零 题目题目描述示例 1&#xff1a;示例 2&#xff1a;提示&#xff1a; 题解思路分析Python 实现代码代码解释提交结果 题目 题目描述 给定一个 m x n 的矩阵&#xff0c;如果一个元素为 0 &#xff0c;则将其所在行和列的所有元素都设为 0 。请使用 原地 算法。 示例…

【深度学习】通俗理解偏差(Bias)与方差(Variance)

在统计学习中&#xff0c;我们通常使用方差与偏差来衡量一个模型 1. 方差与偏差的概念 偏差(Bais)&#xff1a; 预测值和真实值之间的误差 方差(Variance)&#xff1a; 预测值之间的离散程度 低偏差低方差、高偏差低方差&#xff1a; 图中每个点表示同一个模型每次采样出不同…

Git学习记录

针对各个项目的gitignore文件示例 github/gitignore: A collection of useful .gitignore templates 忽略文件 文件 .gitignore 的格式规范如下&#xff1a; • 所有空行或者以 &#xff03; 开头的行都会被 Git 忽略。 • 可以使用标准的 glob 模式匹配。 • 匹配模式…

自然语言转 SQL:通过 One API 将 llama3 模型部署在 Bytebase SQL 编辑器

使用 Open AI 兼容的 API&#xff0c;可以在 Bytebase SQL 编辑器中使用自然语言查询数据库。 出于数据安全的考虑&#xff0c;私有部署大语言模型是一个较好的选择 – 本文选择功能强大的开源模型 llama3。 由于 OpenAI 默认阻止出站流量&#xff0c;为了简化网络配置&#…

Cookie和Session

会话&#xff1a; 有状态会话&#xff1a; 客户端知道发起请求的是谁 无状态会话&#xff1a; 不知道发起请求的是谁 只知道有请求 http是无状态请求 保存会话信息的两种技术&#xff1a; 可以通过Cookie和Session储存会话信息 cookie&#xff1a;客户端技术 信心存…

ImportError: attempted relative import with no known parent package 报错的解决!

本人在做调用超级鹰API解决点触验证码时&#xff0c;两次出现本报错。研究后解决&#xff0c;步骤如下&#xff1a;&#xff08;注意&#xff1a;如果项目目录结构简单且无中文目录&#xff0c;直接使用绝对路径即可解决&#xff01;&#xff01;&#xff01;&#xff09; 1.项…

介绍下不同语言的异常处理机制

Golang 在Go语言中&#xff0c;有两种用于处于异常的机制&#xff0c;分别是error和panic&#xff1b; panic panic 是 Go 中处理异常情况的机制&#xff0c;用于表示程序遇到了无法恢复的错误&#xff0c;需要终止执行。 使用场景 程序出现严重的不符合预期的问题&#x…

使用gtsam添加OrientedPlane3Factor平面约束因子

在基于地面约束的SLAM优化中&#xff0c;已知的地面信息&#xff08;如 plan.pcd 文件中的地面模型&#xff09;可以用作一个先验约束&#xff0c;以帮助优化位姿估计。具体而言&#xff0c;这个过程涉及将地面模型和每个帧的位姿结合&#xff0c;以创建一个因子模型&#xff0…

Cython全教程2 多种定义方式

—— 本篇文章&#xff0c;主要讲述Cython中的四种定义关键字 全教程2 多种定义方式&#xff1a; 在Cython中&#xff0c;关于定义的关键字有四个&#xff0c;分别是&#xff1a; cdef、def、cpdef、DEF 一、cdef定义关键字 顾名思义&#xff0c;cdef关键字定义的是一个C函数…

WINFORM - DevExpress -> DevExpress总结[安装、案例]

安装devexpress软件 路径尽量不换&#xff0c;后面破解不容易出问题 vs工具箱添加控件例如: ①使用控制台进入DevExpress安装目录: cd C:\Program Files (x86)\DevExpress 20.1\Components\Tools ②添加DevExpress控件&#xff1a; ToolboxCreator.exe/ini:toolboxcreator…

primitive 的 Appearance编写着色器材质

import { nextTick, onMounted, ref } from vue import * as Cesium from cesium import gsap from gsaponMounted(() > { ... })// 1、创建矩形几何体&#xff0c;Cesium.RectangleGeometry&#xff1a;几何体&#xff0c;Rectangle&#xff1a;矩形 let rectGeometry new…

《JavaWeb开发-javascript基础》

文章目录 《JavaWeb开发-javascript基础》1.javascript 引入方式2.JS-基础语法-书写语法2.1 书写语法2.2 输出语句 3.JS-基础语法-变量4.JS-基础语法-数据类型&运算符4.1 数据类型4.2 运算符4.3 数据类型转换 5. JS-函数6. JS-对象-Array数组7. JS-对象-String字符串8. JS-…

从CentOS到龙蜥:企业级Linux迁移实践记录(龙蜥开局)

引言&#xff1a; 在我们之前的文章中&#xff0c;我们详细探讨了从CentOS迁移到龙蜥操作系统的基本过程和考虑因素。今天&#xff0c;我们将继续这个系列&#xff0c;重点关注龙蜥系统的实际应用——特别是常用软件的安装和配置。 龙蜥操作系统&#xff08;OpenAnolis&#…

【python基础——异常BUG】

什么是异常(BUG) 检测到错误,py编译器无法继续执行,反而出现错误提示 如果遇到错误能继续执行,那么就捕获(try) 1.得到异常:try的执行,try内只可以捕获一个异常 2.预案执行:except后面的语句 3.传入异常:except … as uestcprint(uestc) 4.没有异常:else… 5.鉴定完毕,收尾的语…

MySQL的安装

MySQL典型的关系型数据库&#xff08;RDBMS&#xff09;&#xff1a;oracle、MySQL、SqlServer MySQL的版本 5.5~5.7、8.0 MySQL的安装和配置 下载地址&#xff1a; https://downloads.mysql.com/archives/community/ 安装包 (x86, 64-bit), MSI Installer 执行下一步即…

跨境电商领域云手机之选:亚矩阵云手机的卓越优势

在跨境电商蓬勃发展的当下&#xff0c;云手机已成为众多企业拓展海外市场的得力助手。亚矩阵云手机凭借其独特优势&#xff0c;在竞争激烈的云手机市场中崭露头角。不过&#xff0c;鉴于市场上云手机服务供应商繁多&#xff0c;企业在抉择时需对诸多要素予以审慎考量。 跨境电商…

【论文阅读】MAMBA系列学习

Mamba code&#xff1a;state-spaces/mamba: Mamba SSM architecture paper&#xff1a;https://arxiv.org/abs/2312.00752 背景 研究问题&#xff1a;如何在保持线性时间复杂度的同时&#xff0c;提升序列建模的性能&#xff0c;特别是在处理长序列和密集数据&#xff08;如…

Java100道面试题

1.JVM内存结构 1. 方法区&#xff08;Method Area&#xff09; 方法区是JVM内存结构的一部分&#xff0c;用于存放类的相关信息&#xff0c;包括&#xff1a; 类的结构&#xff08;字段、方法、常量池等&#xff09;。字段和方法的描述&#xff0c;如名称、类型、访问修饰符…