Tag Archiv für Group By Extensions

Group By extensions IV: Functions

Hi,

This is the last part of my article series on SQL’s Group By extensions. In the first part we dealt with the ROLLUP statement, the second part then covered the CUBE statement and in the third part we discovered flexibility with the GROUPING SETS. Today we will take a look at two functions that are sometimes used to filter some of the auto-generated sums of ROLLUP and CUBE. The names of the functions are:

  • GROUPING(<expression>)
  • GROUPING_ID(<expression1>, <expression2>, …, <expression n>)

The examples are based on the test data of part 2, i.e. the table looks something like this:

Day Product No Product Name Turnover
01.01.2017 1 Product 1 20
2 Product 2 30
3 Product 3 40
02.01.2017 1 Product 1 30
2 Product 2 40
3 Product 3 50
….
30.04.2017 3 Product 3 330

 

The GROUPING() function is used to check whether an expression was used in the grouping or not. Here comes a small example for that:

SELECT  to_char(DAY, ‚YYYY-Q‘) AS QUARTER_ID,
              to_char(DAY, ‚YYYY-MM‘) AS MONTH_ID,
              SUM(TURNOVER) AS TURNOVER,
              GROUPING(to_char(DAY, ‚YYYY-Q‘)) AS GRP_QUARTER,
              GROUPING(to_char(DAY, ‚YYYY-MM‘)) AS GRP_MONTH
     
FROM TBL_TEST
GROUP BY ROLLUP(to_char(DAY, ‚YYYY-Q‘), to_char(DAY, ‚YYYY-MM‘))
ORDER BY QUARTER_ID, MONTH_ID;
 

Result:

BLOG_0023_PIC01_Result_Query_1

In this example you can see how the GROUPING() function basically works. For the most detailed level (I.e. month) both functions return 0 as the month_id and the quarter_id columns are grouped by. For the quarter sums the month_id is not grouped by and therefore the GROUPING() function returns 1 for this column. For the total sum both id-columns are not in the GROUP BY and that leads to a result of 1 for both GROUPIN() functions.

You can use these results to decide what sums you want to keep or remove. For that you usually need these functions when you use CUBE or ROLLUP. With GROUPING SETS you normally don’t have the need for these functions as you can exactly define what sums you need.

The GROUPING_ID() function is the combination of several GROUPING() functions in the way that if you take several GROUPING results (which are always 1 or 0 = binary) and put them in a row you receive a binary string which can be interpreted as a decimal number and that one is returned by GROUPING_ID. You receive a different ID for each different kind of sum. You then only have one column with the ID instead of several columns as with the GROUPING() function. Now here is an example for that:

SELECT  to_char(DAY, ‚YYYY-Q‘) AS QUARTER_ID,
              to_char(DAY, ‚YYYY-MM‘) AS MONTH_ID,
              SUM(TURNOVER) AS TURNOVER,
              GROUPING(to_char(DAY, ‚YYYY-Q‘)) AS GRP_QUARTER,
              GROUPING(to_char(DAY, ‚YYYY-MM‘)) AS GRP_MONTH,
              GROUPING_ID(to_char(DAY, ‚YYYY-Q‘), to_char(DAY, ‚YYYY-MM‘)) AS GRP_ID
 
FROM TBL_TEST
GROUP BY ROLLUP(to_char(DAY, ‚YYYY-Q‘), to_char(DAY, ‚YYYY-MM‘))
ORDER BY QUARTER_ID, MONTH_ID;

Result:

BLOG_0023_PIC02_Result_Query_2

 

I’ve take the first example and added only the last row in the SELECT. This is the GROUPING_ID() function which takes several expressions. In this example we put the expressions of the QUARTER_ID and MONTH_ID columns. The order of the expressions is important. It defines on what position of the binary string the result of the GROUPING of the column is put. If you take a look at row number 4. The GROUPING of Quarter is 0 and the GROUPING of Month is 1. As an binary string this would be 01 and if you transform it to a decimal number it’s: 0*2+1*2° = 0 + 1 = 1. If we check line no. 7 we have 11 as a binary string and decimal it’s: 1*2 + 1*2° = 2 + 1 = 3.

Now another example for the GROUPING():

SELECT  to_char(DAY, ‚YYYY-Q‘) AS QUARTER_ID,
              to_char(DAY, ‚YYYY-MM‘) AS MONTH_ID,
              DAY,
              SUM(TURNOVER) AS TURNOVER,
              GROUPING(to_char(DAY, ‚YYYY-Q‘)) AS GRP_QUARTER,
              GROUPING(to_char(DAY, ‚YYYY-MM‘)) AS GRP_MONTH,
              GROUPING(DAY) AS GRP_DAY,
              GROUPING_ID(to_char(DAY, ‚YYYY-Q‘), to_char(DAY, ‚YYYY-MM‘),
                                      DAY) AS GRP_ID     
 
FROM TBL_TEST
WHERE to_char(DAY,’DD‘) IN (’01‘, ’02‘, ’03‘)
GROUP BY ROLLUP(to_char(DAY, ‚YYYY-Q‘), to_char(DAY, ‚YYYY-MM‘), DAY)
ORDER BY QUARTER_ID, MONTH_ID, DAY;

Result:

BLOG_0023_PIC03_Result_Query_3

This example is based on the second example. In addition I’ve added the DAY column in the SELECT and ROLLUP. Also I’ve added a GROUPING() for DAY and added it to the GROUPING_ID() function. A filter in the WHERE clause just limit the rows in the result. It’s pretty much the same as in the second example. But you have one column more in this example. The daily sums have a GROUPING_ID of 0, the monthly SUMS of 1, the quarterly sums of 3 and the total sum a GROUPING_ID of 7. The 7 is calculated like this: 111 = 1*2² + 1*2 + 1*2° = 4 + 2 + 1 = 7.

I hope it became clear a little how the functions GROUPING() and GROUPING_ID() works and what they can be used for. These functions can be used in the HAVING clause to filter out some sums. Also these functions can be used with CUBE and also GROUPING SETS.

The examples used in this article can be found at the very end of this article as zip-file. I will continue this series soon and we will then deal with some useful functions you can use if you are working with the rollup or cube statement.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

 

Downloads

BLOG_0023_Testdata_&_Examples.zip

Group By extensions I: ROLLUP

In todays article I will deal with an extension of the classical Group By clause. The Group By clause is used to aggregate data to a specific level of granularity. Let’s say the data in the database is daily data and you need a report on monthly level. Then you have to aggregate the data, i.e. you have to sum up all lines that belong to the same month. The result will be that you have one line per month. For that you would introduce a GROUP BY for defining the needed level of granularity (in this case: monthly level). Furthermore you would use the SUM() aggregation function to sum up the kpi.

That all works fine so far but it has one limitation: You can only have one GROUP BY per SQL statement and therefore you can only aggregate the data to one level of granularity within one query. What if you need monthly data and quarterly data in one query? Then you would use two queries and the set operation UNION ALL. Or after this article you will use the Group By extension ROLLUP. This has two advantages:

  1. Performance
  2. One additional clause vs. another query + UNION ALL

In the first approach (second query + UNION ALL) the tables are accessed several times (depending on how many queries are used). That can lead to bad performance. Furthermore the sql will grow because you have several queries. When using the Group By extension you can see in the execution plan that the tables are only accessed once. And the SQL isn’t growing that much because all you have to do is to extent the GROUP BY clause.

Okay, let’s write some code. Given the following table (see the zip-file for a creation script):

Day (DD.MM.YY)
TURNOVER
01.01.17 1000
1000
31.01.17 1000
01.02.17 1200
1200
28.02.17 1200
01.03.17 1500
1500
30.04.17 1500

 

Now we want to create a query with monthly data and a total over everything. Without the Group By extension it would look something like this:

SELECT  to_char(DAY,’MM/YYYY‘) AS MONTH,
             SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST
GROUP BY to_char(DAY,’MM/YYYY‘) 
 
UNION ALL
 
SELECT  ‚TOTAL 2017‘ AS MONTH,
              SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST;

Result:

BLOG_0019_PIC01_Result_Query1

As you can see we have two queries that are connected via UNION ALL. The first sql creates the monthly totals. The second one creates the query total.

With ROLLUP the same query would look like this:

SELECT  nvl(to_char(DAY,’MM/YYYY‘),’TOTAL 2017′) AS MONTH,
            SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST
GROUP BY ROLLUP (to_char(DAY,’MM/YYYY‘))

Result:

BLOG_0019_PIC01_Result_Query1

The result is exactly the same as in the first query. But this one is a little shorter than the first one. In the select you put all columns you need to distinguish the totals. In our case we have the month and the turnover total. The Group By is extended by ROLLUP ( …. ). Within the brackets you put one or more expressions. They have to be consistent to the SELECT clause. The ROLLUP clause is creating rollups along some kind of dimension. In this dimensions the levels are defined through the expressions. Totals over all is added automatically. This means in our case that we just put the expression for the month (the same as in the select) in the ROLLUP and the total over all is added automatically.

Now we want to create a more complex example. We are adding totals for the quarter and the year. That makes four different totals (month, quarter, year, query). Our query looks like this one:

SELECT  to_char(DAY,’YYYY‘) AS YEAR,
             ‚Q’||to_char(DAY,’Q/YYYY‘) AS QUARTER,
             to_char(DAY,’MM/YYYY‘) AS MONTH,
             SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST
GROUP BY ROLLUP ( to_char(DAY,’YYYY‘),
                                ‚Q’||to_char(DAY,’Q/YYYY‘),
                                 to_char(DAY,’MM/YYYY‘)
)

Result:

BLOG_0019_PIC02_Result_Query3

For each total we added one expression to define the level. The ROLLUP clause is processed from the right to the left, i.e. first all columns will be grouped by, then the very right one is removed and the group by only consists of quarter and year, then the next one is removed from the right and so on. In that way all the sums are created.

Sometimes you have the need to display two columns for a level cause they belong logically together, e.g. product no and product name. If you just add both to the ROLLUP you receive two total lines for the product. What you can do is to put these two columns in brackets within the ROLLUP clause like this: ROLLUP( bla, (Product_No, Product_Name), … )

We call this composite columns. If they are put in brackets they behave as one total line.

The ROLLUP is available on Oracle, MS SQL Server, DB2, Postgres, etc. Unfortunately it’s not available on MySQL. I hope you enjoyed today’s lesson. The examples used in this article can be found at the very end of this article as zip-file. I will continue this series soon and we will then deal with the CUBE statement.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

Download

BLOG_0019_GroupBy_Ext_ROLLUP_examples.zip