PostgreSQL进阶:窗口函数与CTE

窗口函数和CTE是SQL进阶的两大利器,能让复杂查询变得清晰优雅。本文以PostgreSQL为例,通过实际场景演示它们的用法。

1. 窗口函数基础

窗口函数在一组相关行("窗口")上执行计算,但不像GROUP BY那样折叠行。语法:

函数名() OVER (
    [PARTITION BY 分区列]
    [ORDER BY 排序列]
    [ROWS/RANGE 窗口范围]
)

ROW_NUMBER / RANK / DENSE_RANK

假设有员工薪资表:

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50),
    salary NUMERIC
);

INSERT INTO employees (name, department, salary) VALUES
('Alice', 'Engineering', 95000),
('Bob', 'Engineering', 95000),
('Charlie', 'Engineering', 85000),
('Diana', 'Marketing', 78000),
('Eve', 'Marketing', 72000),
('Frank', 'Marketing', 72000);

三种排名函数的区别:

SELECT
    name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;

结果:

name    | department  | salary | row_num | rank | dense_rank
--------|-------------|--------|---------|------|----------
Alice   | Engineering | 95000  | 1       | 1    | 1
Bob     | Engineering | 95000  | 2       | 1    | 1
Charlie | Engineering | 85000  | 3       | 3    | 2
Diana   | Marketing   | 78000  | 1       | 1    | 1
Eve     | Marketing   | 72000  | 2       | 2    | 2
Frank   | Marketing   | 72000  | 3       | 2    | 2
  • ROW_NUMBER:永远不重复,相同值也给不同编号
  • RANK:相同值给相同排名,但下一个排名会跳过(1,1,3)
  • DENSE_RANK:相同值给相同排名,下一个排名紧接(1,1,2)

LAG / LEAD

访问当前行的前一行或后一行:

-- 每个员工与同部门前一个人的薪资差
SELECT
    name, department, salary,
    LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS prev_salary,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS diff
FROM employees;

LAG(column, offset, default)可以指定偏移量和默认值,LEAD方向相反。

SUM / AVG OVER

聚合函数也可以用作窗口函数:

SELECT
    name, department, salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    ROUND(salary * 100.0 / SUM(salary) OVER (PARTITION BY department), 1) AS pct
FROM employees;

这会计算每个部门的薪资总额,以及每人占部门总额的百分比——不需要子查询或JOIN。

累计求和:

SELECT
    name, department, salary,
    SUM(salary) OVER (
        PARTITION BY department
        ORDER BY salary
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM employees;

2. CTE(Common Table Expressions)

CTE通过WITH子句定义临时结果集,让复杂查询模块化:

WITH dept_stats AS (
    SELECT
        department,
        AVG(salary) AS avg_salary,
        MAX(salary) AS max_salary,
        COUNT(*) AS headcount
    FROM employees
    GROUP BY department
)
SELECT
    e.name,
    e.salary,
    e.department,
    ds.avg_salary,
    e.salary - ds.avg_salary AS diff_from_avg
FROM employees e
JOIN dept_stats ds ON e.department = ds.department
WHERE e.salary > ds.avg_salary;

比嵌套子查询可读性好得多。多个CTE可以逗号分隔链式定义。

3. 递归CTE

递归CTE用于处理层次结构数据(树、图)。语法:

WITH RECURSIVE cte_name AS (
    -- 基础部分(anchor)
    SELECT ...
    UNION ALL
    -- 递归部分
    SELECT ... FROM cte_name WHERE ...
)
SELECT * FROM cte_name;

经典示例——组织架构树:

CREATE TABLE org (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    manager_id INT REFERENCES org(id)
);

INSERT INTO org VALUES
(1, 'CEO', NULL),
(2, 'CTO', 1),
(3, 'CFO', 1),
(4, 'Tech Lead', 2),
(5, 'Dev A', 4),
(6, 'Dev B', 4),
(7, 'Accountant', 3);

-- 查询CEO下所有层级的下属
WITH RECURSIVE subordinates AS (
    SELECT id, name, manager_id, 0 AS depth
    FROM org
    WHERE name = 'CEO'

    UNION ALL

    SELECT o.id, o.name, o.manager_id, s.depth + 1
    FROM org o
    JOIN subordinates s ON o.manager_id = s.id
)
SELECT
    REPEAT('  ', depth) || name AS org_tree,
    depth
FROM subordinates
ORDER BY depth, name;

结果:

org_tree         | depth
-----------------|------
CEO              | 0
  CFO            | 1
  CTO            | 1
    Accountant   | 2
    Tech Lead    | 2
      Dev A      | 3
      Dev B      | 3

递归CTE一定要有终止条件(JOIN不到新行时自动停止),否则会无限递归。PostgreSQL默认有递归深度保护。

4. 实战:TopN查询

"每个部门薪资前2名"是经典面试题,窗口函数一行搞定:

WITH ranked AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 2;

CTE + 窗口函数的组合是处理分组排名问题的标准范式。

小结

窗口函数和CTE不是"高级语法糖",而是解决真实业务问题的核心工具。掌握这两个特性,SQL能力会有质的提升。