窗口函数和CTE是SQL进阶的两大利器,能让复杂查询变得清晰优雅。本文以PostgreSQL为例,通过实际场景演示它们的用法。
1. 窗口函数基础
窗口函数在一组相关行("窗口")上执行计算,但不像GROUP BY那样折叠行。语法:
函数名() OVER (
[PARTITION BY 分区列]
[ORDER BY 排序列]
[ROWS/RANGE 窗口范围]
)
ROW_NUMBER / RANK / DENSE_RANK
假设有员工薪资表:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(50),
department VARCHAR(50),
salary NUMERIC
);
INSERT INTO employees (name, department, salary) VALUES
('Alice', 'Engineering', 95000),
('Bob', 'Engineering', 95000),
('Charlie', 'Engineering', 85000),
('Diana', 'Marketing', 78000),
('Eve', 'Marketing', 72000),
('Frank', 'Marketing', 72000);
三种排名函数的区别:
SELECT
name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;
结果:
name | department | salary | row_num | rank | dense_rank
--------|-------------|--------|---------|------|----------
Alice | Engineering | 95000 | 1 | 1 | 1
Bob | Engineering | 95000 | 2 | 1 | 1
Charlie | Engineering | 85000 | 3 | 3 | 2
Diana | Marketing | 78000 | 1 | 1 | 1
Eve | Marketing | 72000 | 2 | 2 | 2
Frank | Marketing | 72000 | 3 | 2 | 2
ROW_NUMBER:永远不重复,相同值也给不同编号RANK:相同值给相同排名,但下一个排名会跳过(1,1,3)DENSE_RANK:相同值给相同排名,下一个排名紧接(1,1,2)
LAG / LEAD
访问当前行的前一行或后一行:
-- 每个员工与同部门前一个人的薪资差
SELECT
name, department, salary,
LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS prev_salary,
salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS diff
FROM employees;
LAG(column, offset, default)可以指定偏移量和默认值,LEAD方向相反。
SUM / AVG OVER
聚合函数也可以用作窗口函数:
SELECT
name, department, salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
ROUND(salary * 100.0 / SUM(salary) OVER (PARTITION BY department), 1) AS pct
FROM employees;
这会计算每个部门的薪资总额,以及每人占部门总额的百分比——不需要子查询或JOIN。
累计求和:
SELECT
name, department, salary,
SUM(salary) OVER (
PARTITION BY department
ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM employees;
2. CTE(Common Table Expressions)
CTE通过WITH子句定义临时结果集,让复杂查询模块化:
WITH dept_stats AS (
SELECT
department,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
COUNT(*) AS headcount
FROM employees
GROUP BY department
)
SELECT
e.name,
e.salary,
e.department,
ds.avg_salary,
e.salary - ds.avg_salary AS diff_from_avg
FROM employees e
JOIN dept_stats ds ON e.department = ds.department
WHERE e.salary > ds.avg_salary;
比嵌套子查询可读性好得多。多个CTE可以逗号分隔链式定义。
3. 递归CTE
递归CTE用于处理层次结构数据(树、图)。语法:
WITH RECURSIVE cte_name AS (
-- 基础部分(anchor)
SELECT ...
UNION ALL
-- 递归部分
SELECT ... FROM cte_name WHERE ...
)
SELECT * FROM cte_name;
经典示例——组织架构树:
CREATE TABLE org (
id INT PRIMARY KEY,
name VARCHAR(50),
manager_id INT REFERENCES org(id)
);
INSERT INTO org VALUES
(1, 'CEO', NULL),
(2, 'CTO', 1),
(3, 'CFO', 1),
(4, 'Tech Lead', 2),
(5, 'Dev A', 4),
(6, 'Dev B', 4),
(7, 'Accountant', 3);
-- 查询CEO下所有层级的下属
WITH RECURSIVE subordinates AS (
SELECT id, name, manager_id, 0 AS depth
FROM org
WHERE name = 'CEO'
UNION ALL
SELECT o.id, o.name, o.manager_id, s.depth + 1
FROM org o
JOIN subordinates s ON o.manager_id = s.id
)
SELECT
REPEAT(' ', depth) || name AS org_tree,
depth
FROM subordinates
ORDER BY depth, name;
结果:
org_tree | depth
-----------------|------
CEO | 0
CFO | 1
CTO | 1
Accountant | 2
Tech Lead | 2
Dev A | 3
Dev B | 3
递归CTE一定要有终止条件(JOIN不到新行时自动停止),否则会无限递归。PostgreSQL默认有递归深度保护。
4. 实战:TopN查询
"每个部门薪资前2名"是经典面试题,窗口函数一行搞定:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 2;
CTE + 窗口函数的组合是处理分组排名问题的标准范式。
小结
窗口函数和CTE不是"高级语法糖",而是解决真实业务问题的核心工具。掌握这两个特性,SQL能力会有质的提升。