Tag Archiv für PARTITION BY

Analytical Functions IV: RANGE vs. ROWS

Hi,

today’s article is the fourth part of my tutorial on analytical functions. Today I will deal with the differences between RANGE and ROWS windows. We already learned about the ROWS windows in Part I of this tutorial. Today we will take closer look at the RANGE windows and how they differ from ROWS.

I’ve prepared some test data (you can download them in the download section). The data looks like this:

BLOG_0031_PIC01_Result_Query

It contains DAYs from 01/01/2017 till 30/04/2017 and PRODUCT_NO from 1 to 3.

Now let’s take a look at the first example. We will create a query to calculate the monthly sums of the turnover. Then we want to compare the previous and the current month. We could do it for example with window functions.

SELECT MONTH,
            TURNOVER AS CUR_MONTH,
            SUM(TURNOVER) OVER (ORDER BY MONTH
                                                    ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
                                                   ) AS PREV_MONTH
FROM (
          SELECT DISTINCT
                     to_char(DAY, ‚YYYY-MM‘) AS MONTH,
                     SUM(TURNOVER) OVER (PARTITION BY to_char(DAY, ‚YYYY-MM‘)
                                                        ) AS TURNOVER
          from tbl_test
     )

What we are doing here is calculating monthly values first. Then we are working with a ROWS window, which means that for each row (that is returned by the sub query) we calculate a separate window which is defined in this case as ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING. That means that we are just accessing the previous row. For that it is important that we first sort the data what is done by ORDER BY MONTH.

ROWS also means that the window definition is based on physical rows. That’s the reason why we need a sub query. First we have to create “physically” monthly rows. Then we can work with them. If we don’t do it we receive strange results as the physical level would still be on day level. So within the window daily data is accessed and shown.

Now if we use the RANGE instead of the ROWS we don’t need the subquery. I.e. we receive the same result with this:

SELECT DISTINCT
           to_char(DAY, ‚YYYY-MM‘) AS MONTH,
           SUM(TURNOVER) OVER (PARTITION BY to_char(DAY, ‚YYYY-MM‘)
                                             ) AS CUR_MONTH,
           SUM(TURNOVER) OVER (ORDER BY to_number(to_char(DAY, ‚YYYYMM‘))
                                               RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING
                                             ) AS PREV_MONTH
FROM tbl_test
 

Test it yourself. The results are the same. There are two differences. We are using RANGE instead of ROWS and we don’t have a sub query. We don’t need the sub query any longer because the RANGE windows are based on logical blocks or logical value changes. When you take a closer look at the ORDER BY clause in the last example you find that we are not using to_char(…..) but to_number(to_char(….)).

We do that as the range windows only works with dates or numbers in the ORDER BY clause. The expression you are using there is important for that logical value change. All rows with the same value in that expression are treated as one. They are summed up (because we are using SUM) first (that’s what we have done manually with the sub query in the first example) and afterwards the window is defined. If we say one block back we are stepping back one logical change. In our case it would mean that we step back one month and not one row.

I know that is a little difficult to understand. ROWS means we are defining our window based on physical row changes. RANGE means we are defining it by logical value change within the ORDER BY expression.

That also leads to another point. If you use ROWS and do your window on physical rows the database doesn’t really care if the previous month isn’t available. If we take April and March isn’t there the February is taken instead. With RANGE it is different. Here a NULL would be shown for March as the previous month of April doesn’t have a value. Which is correct then.

This was just a small introduction to the RANGE windows. If you are interested in further information on that just check out my BLOG. Maybe I will write another article on that with more examples. Or take a look at my new SQL book. I’ve added a free excerpt on this BLOG as well.

You find the example code for this article in the download section.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses is also available in English language. So check out my trainings (German) or my online courses (English)!

Download

BLOG_0031_ROWS_RANGE_Examples.zip

Oracle Analytic Functions – Part II

As this is the second part of my series about Oracle’s analytic functions I would like to step ahead with two more functions:

  • LEAD-function
  • LAG-function

Sometimes you need to access the preceding or following data row. In these cases you can use the above functions. Let’s use the following table to build some examples:

MONTH QTY TURNOVER
1 201211 3 125
2 201212 5 500
3 201301 1 100
4 201302 2 150
5 201303 3 250
6 201304 4 450

Fig.1: TBL_SALES

 

If you want to show the turnover of every month compared with the turnover of the month before you can use the Lag-Function. The general syntax for both functions is the following:

Lead/Lag (<Value>, <No of Rows>, <Default>) Over (Partition by … Order by ….)

Please note that you have to use the order by clause for these two functions. Lead returns the following row and Lag the preceding row. <Value> describes the Column of which the following or preceding value is returned. <No of rows> sets how many rows in front or back the system should look for and return a value. If there isn’t a value available (within the partition) then the <default> value is returned.

Let’s develop the first example:

       SELECT MONTH,
                TURNOVER,
                LAG(TURNOVER,1,-1) OVER (ORDER BY MONTH ASC) AS TURNOVER_2
FROM TBL_SALES

The result should look like this:

MONTH               TURNOVER            TURNOVER_2
——————————————————-
201211                125                   -1
201212                500                  125
201301                100                   500
201302                150                   100
201303                250                   150
201304                450                   250

As you can see the TURNOVER_2  column contains the value of the turnover column of the row before. The very first row of our little example TURNOVER_2 is -1 cause there is no row before and because of this a value can’t be found and the -1 is returned instead. If you wouldn’t define a default value then NULL is returned.

Now we change our statement a little:

       SELECT MONTH,
                TURNOVER,
                LAG(TURNOVER,1,-1) OVER (PARTITION BY substr(MONTH,1,4)
                                                      ORDER BY MONTH ASC)
                   AS TURNOVER_2
FROM TBL_SALES

The result looks this way:

MONTH               TURNOVER            TURNOVER_2
——————————————————
201211                 125                 -1
201212                 500                 125
201301                 100                 -1
201302                 150                 100

The difference is the partition by clause. That means that the function is only used within the partitions we have defined. In our case we have partitioned our data by years. To do so we use the function substr on the column MONTH. Now the DBMS uses the LAG function only within the corresponding year. That means for 2012 the first row in our table is the 201211. We have a -1 there. The next row is the second row within the partition. So we see the value of the row before there. The third row in our table is the 201301. It’s a new partition because substr(201301,1,4)=2013 and that is different from 2012. So it starts again and the first row of this partition (i.e. the third row in the table) is -1 again.

I hope the concept of partitioning became clearer with this issue too. Have fun experimenting with the two new functions.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

Oracle Analytic Functions – Part I

In today’s issue I would like to address a very useful topic in an oracle environment called analytic functions. These functions are extremely useful to build complex calculations in an easy manner. In older oracle versions you had to create compley sub-queries and joins what can be now done by analytic functions.

To give you a brief introduction in this topic we will build YearToDate sums with these functions, that means for our monthly turnovers we want to calculate the YearToMonth turnover (turnover from January until each month) for each month with the help of analytic functions. For this our base table TBL_SALES looks like this:

MONTH QTY TURNOVER
1 201212 5 500
2 201301 1 100
3 201302 2 150
4 201303 3 250
5 201304 4 450

Fig.1: TBL_SALES

The general syntax fort he analytic functions is like this:

Function(….) Over( Partition by ….. Order by ….. Rows between …. And …..)

There are a lot of functions that can be used with this, but today we will focus on the SUM function, which is also known as aggregate function. The disadvantage of these aggregates is, that they can only work on the grouping of the whole query. But sometimes you need different groupings for different calculations. Maybe we want to list every month with its quantity and then the complete quantity. Without analytic functions one could create a sub query that sums all quantities, maybe like this:

SELECT MONTH,
QTY,
(SELECT SUM(QTY) FROM TBL_SALES) AS TOTAL_QTY
FROM TBL_SALES

 

With analytical functions the same would look like this:

SELECT MONTH,
                QTY,
                SUM(QTY) OVER() AS TOTAL_QTY
FROM TBL_SALES

The result in both cases is this:

MONTH               QTY   TOTAL_QTY
———————————————–
201212                  5             15
201304                  4             15
201303                  3             15
201301                  1             15
201302                  2             15

 

As this example is pretty simple the advantage isn’t that big. But the more complex the aggregates are, the higher is the advantage using analytic functions.

Back to the syntax of analytic functions – In our case above we didn’t use the PARTITION BY neither the ORDER BY clause. The reason is because we wanted to calculate a SUM over everything in the table. We can use the PARTITION BY clause to define on what level the function is working, i.e. on what level the total quantities are calculated. If we want to calculate them for each year then we would define a partition on the year like this:

SELECT MONTH,
                QTY,
                SUM(QTY) OVER(substr(MONTH,0,4) AS TOTAL_QTY_YEAR
FROM TBL_SALES

 

The result would be like this:

MONTH               QTY   TOTAL_QTY_YEAR
———————————————–
201212                  5             5
201304                  4             10
201303                  3             10
201301                  1             10
201302                  2             10

 

We now want to calculate the YearToMonth-Sum for each Month. For this we need to tell oracle that in Jan 2013 he has to summarize just the January qty, in Feb 2013 he has to sum the Jan and Feb quantity and so on. We can give this information with the ROWS BETWEEN …. AND …. Statement. With this we can define a dynamic window. But when we use it we need to order our datasets too. This is done with the ORDER BY clause within the Function. The complete Statement for the YearToMonth SUM would look like this:

 

SELECT  MONTH,
        QTY,
        SUM(QTY) OVER(PARTITION BY substr(MONTH,0,4)
                      ORDER BY MONTH ASC
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTM
FROM TBL_SALES

 

The result is this:

MONTH               QTY   TOTAL_QTY_YEAR
———————————————–
201212                  5             5
201301                  1             1
201302                  2             3
201303                  3             6
201304                  4             10

 

In this case UNBOUNDED PRECEDING means that all rows before the current row but within the partition are taken.

This was an example to show the basic principle how these functions work. You can easily do the same on partitions over several columns. And of course you can combine them all. For example you could combine the YTM Calculation and the Total_Qty_Year in one statement.

This article was the first part of the series on the analytic functions. In the next parts I will describe more functions and give you further examples on how to use them correctly.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!