In todays article I will deal with an extension of the classical Group By clause. The Group By clause is used to aggregate data to a specific level of granularity. Let’s say the data in the database is daily data and you need a report on monthly level. Then you have to aggregate the data, i.e. you have to sum up all lines that belong to the same month. The result will be that you have one line per month. For that you would introduce a GROUP BY for defining the needed level of granularity (in this case: monthly level). Furthermore you would use the SUM() aggregation function to sum up the kpi.
That all works fine so far but it has one limitation: You can only have one GROUP BY per SQL statement and therefore you can only aggregate the data to one level of granularity within one query. What if you need monthly data and quarterly data in one query? Then you would use two queries and the set operation UNION ALL. Or after this article you will use the Group By extension ROLLUP. This has two advantages:
- One additional clause vs. another query + UNION ALL
In the first approach (second query + UNION ALL) the tables are accessed several times (depending on how many queries are used). That can lead to bad performance. Furthermore the sql will grow because you have several queries. When using the Group By extension you can see in the execution plan that the tables are only accessed once. And the SQL isn’t growing that much because all you have to do is to extent the GROUP BY clause.
Okay, let’s write some code. Given the following table (see the zip-file for a creation script):
Now we want to create a query with monthly data and a total over everything. Without the Group By extension it would look something like this:
|SELECT to_char(DAY,’MM/YYYY‘) AS MONTH, SUM(TURNOVER) AS TURNOVER FROM TBL_TEST GROUP BY to_char(DAY,’MM/YYYY‘) UNION ALL SELECT ‚TOTAL 2017‘ AS MONTH, SUM(TURNOVER) AS TURNOVER FROM TBL_TEST;|
As you can see we have two queries that are connected via UNION ALL. The first sql creates the monthly totals. The second one creates the query total.
With ROLLUP the same query would look like this:
|SELECT nvl(to_char(DAY,’MM/YYYY‘),’TOTAL 2017′) AS MONTH, SUM(TURNOVER) AS TURNOVER FROM TBL_TEST GROUP BY ROLLUP (to_char(DAY,’MM/YYYY‘))|
The result is exactly the same as in the first query. But this one is a little shorter than the first one. In the select you put all columns you need to distinguish the totals. In our case we have the month and the turnover total. The Group By is extended by ROLLUP ( …. ). Within the brackets you put one or more expressions. They have to be consistent to the SELECT clause. The ROLLUP clause is creating rollups along some kind of dimension. In this dimensions the levels are defined through the expressions. Totals over all is added automatically. This means in our case that we just put the expression for the month (the same as in the select) in the ROLLUP and the total over all is added automatically.
Now we want to create a more complex example. We are adding totals for the quarter and the year. That makes four different totals (month, quarter, year, query). Our query looks like this one:
|SELECT to_char(DAY,’YYYY‘) AS YEAR, ‚Q’||to_char(DAY,’Q/YYYY‘) AS QUARTER, to_char(DAY,’MM/YYYY‘) AS MONTH, SUM(TURNOVER) AS TURNOVER FROM TBL_TEST GROUP BY ROLLUP ( to_char(DAY,’YYYY‘), ‚Q’||to_char(DAY,’Q/YYYY‘), to_char(DAY,’MM/YYYY‘) )|
For each total we added one expression to define the level. The ROLLUP clause is processed from the right to the left, i.e. first all columns will be grouped by, then the very right one is removed and the group by only consists of quarter and year, then the next one is removed from the right and so on. In that way all the sums are created.
Sometimes you have the need to display two columns for a level cause they belong logically together, e.g. product no and product name. If you just add both to the ROLLUP you receive two total lines for the product. What you can do is to put these two columns in brackets within the ROLLUP clause like this: ROLLUP( bla, (Product_No, Product_Name), … )
We call this composite columns. If they are put in brackets they behave as one total line.
The ROLLUP is available on Oracle, MS SQL Server, DB2, Postgres, etc. Unfortunately it’s not available on MySQL. I hope you enjoyed today’s lesson. The examples used in this article can be found at the very end of this article as zip-file. I will continue this series soon and we will then deal with the CUBE statement.
If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!