[TOC] 语法 ~~~ function(args) OVER (partition_by_clause order_by_clause [window_clause]) partition_by_clause ::= PARTITION BY expr [, expr ...] order_by_clause ::= ORDER BY expr [ASC | DESC] [, expr [ASC | DESC] ...] window_clause ::= ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ] ~~~ #### 7.2.5.1 PARTITION BY从句 Partition By从句和Group By类似。它把输入行按照指定的一列或多列分组,相同值的行会被分到一组。 #### 7.2.5.2 ORDER BY从句 Order By从句和外层的Order By基本一致。它定义了输入行的排列顺序,如果指定了Partition By,则Order By定义了每个Partition分组内的顺序。与外层Order By的唯一不同点是,OVER从句中的Order By n(n是正整数)相当于不做任何操作,而外层的Order By n表示按照第n列排序。 这个例子展示了在select列表中增加一个id列,它的值是1,2,3等等,顺序按照events表中的date\_and\_time列排序。 `SELECT  ` `row_number() OVER (ORDER BY date_and_time) AS id,  ` `c1, c2, c3, c4  ` `FROM events;` #### 7.2.5.3 WINDOW从句 Window从句用来为窗口函数指定运算范围:以当前行为准,前后若干行作为窗口函数运算的对象。Window从句支持的方法有:AVG(), COUNT(), FIRST\_VALUE(), LAST\_VALUE()和SUM()。对于 MAX()和MIN(),Window从句可以指定开始范围为UNBOUNDED PRECEDING。 语法: `ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]` 边界的类型: `CURRENT ROW ---- 当前行` `M PRECEDING ---- 前M行` `UNBOUNDED PRECEDING ---- 没有上限,从分区第一行开始` `N FOLLOWING ---- 后N行` `UNBOUNDED FOLLOWING ---- 没有下限,直到分区最后一行结束` 举例,假设我们有如下的股票数据,股票代码是JDR,closing price是每天的收盘价。 `create table stock_ticker (stock_symbol string, closing_price decimal(8,2), closing_date timestamp);` `// ...load some data...` `select * from stock_ticker order by stock_symbol, closing_date;` `| stock_symbol | closing_price | closing_date |` `|--------------|---------------|--------------|` `| JDR | 12.86 | 2014-10-02 00:00:00 |` `| JDR | 12.89 | 2014-10-03 00:00:00 |` `| JDR | 12.94 | 2014-10-04 00:00:00 |` `| JDR | 12.55 | 2014-10-05 00:00:00 |` `| JDR | 14.03 | 2014-10-06 00:00:00 |` `| JDR | 14.75 | 2014-10-07 00:00:00 |` `| JDR | 13.98 | 2014-10-08 00:00:00 |` 下面这个查询使用窗口函数产生moving\_average这一列,它的值是3天的股票均价,即前一天、当前以及后一天三天的均价。第一天没有前一天的值,最后一天没有后一天的值,所以这两行只计算了两天的均值。这里Partition By没有起到作用,因为所有的数据都是JDR的数据,但如果还有其他股票信息,Partition By会保证窗口函数只作用在本Partition之内。 `select stock_symbol, closing_date, closing_price,    ` `avg(closing_price) over (partition by stock_symbol order by closing_date) as moving_average    ` `from stock_ticker;` `| stock_symbol | closing_date | closing_price | moving_average |` `|--------------|--------------|---------------|----------------|` `| JDR | 2014-10-02 00:00:00 | 12.86 | 12.87 |` `| JDR | 2014-10-03 00:00:00 | 12.89 | 12.89 |` `| JDR | 2014-10-04 00:00:00 | 12.94 | 12.79 |` `| JDR | 2014-10-05 00:00:00 | 12.55 | 13.17 |` `| JDR | 2014-10-06 00:00:00 | 14.03 | 13.77 |` `| JDR | 2014-10-07 00:00:00 | 14.75 | 14.25 |` `| JDR | 2014-10-08 00:00:00 | 13.98 | 14.36 |` <br> #### 7.2.5.4 函数 - avg **函数定义** `AVG([DISTINCT | ALL] *expression*) [OVER (*analytic_clause*)]` 计算窗口内数据的平均值。 **示例** 计算当前行和它前后各一行数据的x平均值。 ```sql select x, property,   avg(x) over( partition by property order by x rows between 1 preceding and 1 following ) as 'moving average'     from int_t where property in ('odd','even'); ``` ``` | x | property | moving average | |- --|----------|----------------| | 2 | even | 3 | | 4 | even | 4 | | 6 | even | 6 | | 8 | even | 8 | | 10 | even | 9 | | 1 | odd | 2 | | 3 | odd | 3 | | 5 | odd | 5 | | 7 | odd | 7 | | 9 | odd | 8 | ``` <br> #### 7.2.5.5 函数 - count **函数定义** `COUNT([DISTINCT | ALL] expression) [OVER (analytic_clause)]` 计算窗口内某值出现的次数。 **示例** 计算从当前行到第一行x出现的次数。 ```sql select x, property,   count(x) over(   partition by property     order by x     rows between unbounded preceding and current row ) as 'cumulative total'     from int_t where property in ('odd','even'); ``` ``` | x | property | cumulative count | |----|----------|------------------| | 2 | even | 1 | | 4 | even | 2 | | 6 | even | 3 | | 8 | even | 4 | | 10 | even | 5 | | 1 | odd | 1 | | 3 | odd | 2 | | 5 | odd | 3 | | 7 | odd | 4 | | 9 | odd | 5 | ``` <br> #### 7.2.5.6 函数 - dense\_rank **函数定义** `DENSE_RANK() OVER(partition_by_clause order_by_clause)` DENSE\_RANK()函数用来表示排名,与RANK()不同的是,DENSE\_RANK()不会出现空缺排名编号。比如,如果出现了两个并列的1,DENSE\_RANK()的第三个数仍然是2,而RANK()的第三个数是3。 > 注:该函数不支持ROWS或RANGE定义的窗口子句。 **示例** 下例展示了按照x列分组对y列排名。 `select x, y, dense_rank() over(partition by x order by y) as rank from int_t;` `| x | y | rank |` `|---|---|------|` `| 1 | 1 | 1 |` `| 1 | 2 | 2 |` `| 1 | 2 | 2 |` `| 2 | 1 | 1 |` `| 2 | 2 | 2 |` `| 2 | 3 | 3 |` `| 3 | 1 | 1 |` `| 3 | 1 | 1 |` `| 3 | 2 | 2 |` #### 7.2.5.7 函数 - first\_value **函数定义** `FIRST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])` 函数FIRST\_VALUE()返回窗口范围内的第一个值。 **示例** 对有如下数据: `select name, country, greeting from mail_merge;` `|   name  | country |   greeting   |` `|---------|---------|--------------|` `| Pete    | USA | Hello        |` `| John    | USA | Hi           |` `| Boris   | Germany | Guten tag    |` `| Michael | Germany | Guten morgen |` `| Bjorn   | Sweden| Hej          |` `| Mats    | Sweden| Tja          |` 使用FIRST\_VALUE(),根据country分组,返回每个分组中第一个greeting的值: `select country, name,    ` `first_value(greeting) over (` `partition by country order by name, greeting` `) as greeting from mail_merge;` `| country | name|  greeting |` `|---------|---------|-----------|` `| Germany | Boris | Guten tag |` `| Germany | Michael | Guten tag |` `| Sweden  | Bjorn | Hej       |` `| Sweden  | Mats| Hej       |` `| USA     | John| Hi        |` `| USA     | Pete| Hi        |` <br> #### 7.2.5.8 函数 - lag **函数定义** `LAG (expr, offset, default) OVER (partition_by_clause order_by_clause)` LAG()方法用来计算当前行向前数若干行的值。 **示例** 计算前一天的股票收盘价。 ```sql select stock_symbol, closing_date, closing_price, lag(closing_price,1, 0) over (partition by stock_symbol order by closing_date) as "yesterday closing" from stock_ticker order by closing_date; ``` ``` | stock_symbol | closing_date | closing_price | yesterday closing | |--------------|--------------|---------------|-------------------| | JDR | 2014-09-13 00:00:00 | 12.86 | 0     | | JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 | | JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 | | JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 | | JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 | | JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 | | JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 | ``` <br> #### 7.2.5.9 函数 - last\_value **函数定义** `LAST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])` 函数LAST\_VALUE()返回窗口范围内的最后一个值。其作用与函数FIRST\_VALUE()相反。 **示例** 对FIRST\_VALUE()函数示例中的数据,根据country分组,返回每个分组中最后一个greeting的值: `select country, name,    ` `last_value(greeting)  ` `over (partition by country order by name, greeting) as greeting  ` `from mail_merge;` `| country | name|   greeting   |` `|---------|---------|--------------|` `| Germany | Boris | Guten morgen |` `| Germany | Michael | Guten morgen |` `| Sweden  | Bjorn | Tja          |` `| Sweden  | Mats| Tja          |` `| USA     | John| Hello        |` `| USA     | Pete| Hello        |` #### 7.2.5.10 函数 - lead **函数定义** `LEAD (expr, offset, default]) OVER (partition_by_clause order_by_clause)` LEAD()方法用来计算当前行向后数若干行的值。 **示例** 计算股票第二天的收盘价对比当天收盘价的走势,即第二天收盘价比当天高还是低。 `select stock_symbol, closing_date, closing_price,    ` `case  ` `(lead(closing_price,1, 0)  ` `over (partition by stock_symbol order by closing_date)-closing_price) > 0  ` `when true then "higher"  ` `when false then "flat or lower"    ` `end as "trending"  ` `from stock_ticker    ` `order by closing_date;` `| stock_symbol | closing_date | closing_price | trending |` `|--------------|--------------|---------------|----------|` `| JDR | 2014-09-13 00:00:00 | 12.86 | higher |` `| JDR | 2014-09-14 00:00:00 | 12.89 | higher |` `| JDR | 2014-09-15 00:00:00 | 12.94 | flat or lower |` `| JDR | 2014-09-16 00:00:00 | 12.55 | higher |` `| JDR | 2014-09-17 00:00:00 | 14.03 | higher |` `| JDR | 2014-09-18 00:00:00 | 14.75 | flat or lower |` `| JDR | 2014-09-19 00:00:00 | 13.98 | flat or lower |` #### 7.2.5.11 函数 - max **函数定义** `MAX([DISTINCT | ALL] expression) [OVER (analytic_clause)]` 计算窗口内数据的最大值。 > 注:该函数暂不支持滑动窗口:rows between M preceding and N following。 **示例** 计算从第一行到当前行之后一行的最大值。 `select x, property,  ` `max(x) over (  ` `order by property, x    ` `rows between unbounded preceding and 1 following` `) as 'local maximum'    ` `from int_t where property in ('prime','square');` `| x | property | local maximum |` `|---|----------|---------------|` `| 2 | prime | 3 |` `| 3 | prime | 5 |` `| 5 | prime | 7 |` `| 7 | prime | 7 |` `| 1 | square | 7 |` `| 4 | square | 9 |` `| 9 | square | 9 |` #### 7.2.5.12 函数 - min **函数定义** `MIN([DISTINCT | ALL] expression) [OVER (analytic_clause)]` 计算窗口内数据的最小值。 > 注:该函数暂不支持滑动窗口:rows between M preceding and N following。 **示例** 计算从第一行到当前行之后一行的最小值。 `select x, property,  ` `min(x) over (    ` `order by property, x desc    ` `rows between unbounded preceding and 1 following` `) as 'local minimum'  ` `from int_t where property in ('prime','square');` `| x | property | local minimum |` `|---|----------|---------------|` `| 7 | prime | 5 |` `| 5 | prime | 3 |` `| 3 | prime | 2 |` `| 2 | prime | 2 |` `| 9 | square | 2 |` `| 4 | square | 1 |` `| 1 | square | 1 |` #### 7.2.5.13 函数 - rank **函数定义** `RANK() OVER(partition_by_clause order_by_clause)` RANK()函数用来表示排名,与DENSE\_RANK()不同的是,RANK()会出现空缺排名编号。比如,如果出现了两个并列的1, RANK()的第三个数就是3,而不是2。 > 注:该函数不支持ROWS或RANGE定义的窗口子句。 **示例** 按照x列分组对y列排名。 `select x, y, rank() over (partition by x order by y) as rank from int_t;` `| x | y | rank |` `|---|---|------|` `| 1 | 1 | 1 |` `| 1 | 2 | 2 |` `| 1 | 2 | 2 |` `| 2 | 1 | 1 |` `| 2 | 2 | 2 |` `| 2 | 3 | 3 |` `| 3 | 1 | 1 |` `| 3 | 1 | 1 |` `| 3 | 2 | 3 |` #### 7.2.5.14 函数 - row\_number **函数定义** `ROW_NUMBER() OVER(partition_by_clause order_by_clause)` 为每个Partition的每一行返回一个从1开始连续递增的整数。与RANK()和DENSE\_RANK()不同的是,ROW\_NUMBER()返回的值不会重复也不会出现空缺,是连续递增的。 > 注:该函数不支持ROWS或RANGE定义的窗口子句。 **示例** `select x, y, row_number() over(partition by x order by y) as rank from int_t;` `| x | y | rank |` `|---|---|------|` `| 1 | 1 | 1 |` `| 1 | 2 | 2 |` `| 1 | 2 | 3 |` `| 2 | 1 | 1 |` `| 2 | 2 | 2 |` `| 2 | 3 | 3 |` `| 3 | 1 | 1 |` `| 3 | 1 | 2 |` `| 3 | 2 | 3 |` <br> #### 7.2.5.15 函数 - sum **函数定义** `SUM([DISTINCT | ALL] expression) [OVER (analytic_clause)]` 计算窗口内数据的和。 **示例** 按照property进行分组,在组内计算当前行以及前后各一行的x列的和。 ```sql select x, property, sum(x) over( partition by property order by x   rows between 1 preceding and 1 following ) as 'moving total'     from int_t where property in ('odd','even'); ``` ``` | x | property | moving total | |----|----------|--------------| | 2 | even | 6 | | 4 | even | 12 | | 6 | even | 18 | | 8 | even | 24 | | 10 | even | 18 | | 1 | odd | 4 | | 3 | odd | 9 | | 5 | odd | 15 | | 7 | odd | 21 | ```