Tag Archiv für Oracle

Analytical Functions IV: RANGE vs. ROWS

Hi,

today’s article is the fourth part of my tutorial on analytical functions. Today I will deal with the differences between RANGE and ROWS windows. We already learned about the ROWS windows in Part I of this tutorial. Today we will take closer look at the RANGE windows and how they differ from ROWS.

I’ve prepared some test data (you can download them in the download section). The data looks like this:

BLOG_0031_PIC01_Result_Query

It contains DAYs from 01/01/2017 till 30/04/2017 and PRODUCT_NO from 1 to 3.

Now let’s take a look at the first example. We will create a query to calculate the monthly sums of the turnover. Then we want to compare the previous and the current month. We could do it for example with window functions.

SELECT MONTH,
            TURNOVER AS CUR_MONTH,
            SUM(TURNOVER) OVER (ORDER BY MONTH
                                                    ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
                                                   ) AS PREV_MONTH
FROM (
          SELECT DISTINCT
                     to_char(DAY, ‘YYYY-MM’) AS MONTH,
                     SUM(TURNOVER) OVER (PARTITION BY to_char(DAY, ‘YYYY-MM’)
                                                        ) AS TURNOVER
          from tbl_test
     )

What we are doing here is calculating monthly values first. Then we are working with a ROWS window, which means that for each row (that is returned by the sub query) we calculate a separate window which is defined in this case as ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING. That means that we are just accessing the previous row. For that it is important that we first sort the data what is done by ORDER BY MONTH.

ROWS also means that the window definition is based on physical rows. That’s the reason why we need a sub query. First we have to create “physically” monthly rows. Then we can work with them. If we don’t do it we receive strange results as the physical level would still be on day level. So within the window daily data is accessed and shown.

Now if we use the RANGE instead of the ROWS we don’t need the subquery. I.e. we receive the same result with this:

SELECT DISTINCT
           to_char(DAY, ‘YYYY-MM’) AS MONTH,
           SUM(TURNOVER) OVER (PARTITION BY to_char(DAY, ‘YYYY-MM’)
                                             ) AS CUR_MONTH,
           SUM(TURNOVER) OVER (ORDER BY to_number(to_char(DAY, ‘YYYYMM’))
                                               RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING
                                             ) AS PREV_MONTH
FROM tbl_test
 

Test it yourself. The results are the same. There are two differences. We are using RANGE instead of ROWS and we don’t have a sub query. We don’t need the sub query any longer because the RANGE windows are based on logical blocks or logical value changes. When you take a closer look at the ORDER BY clause in the last example you find that we are not using to_char(…..) but to_number(to_char(….)).

We do that as the range windows only works with dates or numbers in the ORDER BY clause. The expression you are using there is important for that logical value change. All rows with the same value in that expression are treated as one. They are summed up (because we are using SUM) first (that’s what we have done manually with the sub query in the first example) and afterwards the window is defined. If we say one block back we are stepping back one logical change. In our case it would mean that we step back one month and not one row.

I know that is a little difficult to understand. ROWS means we are defining our window based on physical row changes. RANGE means we are defining it by logical value change within the ORDER BY expression.

That also leads to another point. If you use ROWS and do your window on physical rows the database doesn’t really care if the previous month isn’t available. If we take April and March isn’t there the February is taken instead. With RANGE it is different. Here a NULL would be shown for March as the previous month of April doesn’t have a value. Which is correct then.

This was just a small introduction to the RANGE windows. If you are interested in further information on that just check out my BLOG. Maybe I will write another article on that with more examples. Or take a look at my new SQL book. I’ve added a free excerpt on this BLOG as well.

You find the example code for this article in the download section.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses is also available in English language. So check out my trainings (German) or my online courses (English)!

Download

BLOG_0031_ROWS_RANGE_Examples.zip

Oracle External Tables

Hi,

In today’s article I want to explain how to use Oracle’s external tables in order to access flat files in the filesystem. Since Oracle 9i it’s possible to mount files in the file system and make then accessible in the database. Those files appear as normal tables within the database and can be accessed in read-only mode. You cannot write to those files or use indexes on these tables. But you can read data from it and join it with other tables.

My example file (cities.txt) looks like this:

BLOG_0030_PIC01_Example_File

First your administrator (or you) has to create a new directory in the file system and make it accessible within the database. Then the user who should access the external table needs to have read + write rights to this directory. The following two lines are granting both:

CREATE OR REPLACE DIRECTORY ext_file_data AS ‘c:\’;
 
GRANT read, write ON DIRECTORY ext_file_data TO test;

With these two lines the user TEST will receive read and write access to c:\.

After you’ve created the directory you can define a table with four columns. When you create the table you have to enhance the external table definition and it looks like this:

CREATE TABLE tbl_cities_ext (
  city_name      VARCHAR2(25),
  population      NUMBER,
  country_name  VARCHAR2(25),
  country_code   VARCHAR2(3)
)
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY ext_file_data
 
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ‘,’
    MISSING FIELD VALUES ARE NULL
    (
        city_name      CHAR(25),
        population      CHAR(10),
        country_name  CHAR(25),
        country_code   CHAR(3)
    )
  )
  LOCATION (‘cities.txt’)
)
REJECT LIMIT UNLIMITED;

You can define a lot of things in the external table definition. Normally you use the oracle loader tool but it’s also possible to use data pump. With DEFAULT DIRECTORY you define the directory in which the files are located. Then you can define several access parameters. These parameters define how the data is stored in the files. In my case records are delimited by a new line and fields are terminated by ‘,’. If field values are missing they should be null. Then we have the column list. For each column within the file we have a column in the external table definition. You could also specify fixed column length and column positions and so on. The filenames itself are put in the LOCATION. You can also put more than one filename here separated by comma. This is just a brief introduction. If you need further information on it, just google oracle external files. You find a lot of tutorials there.

When everything was implemented like this then you can start accessing the table with normal SQL operations. This technique can be used to import data to the database.

You find the example code for this in the download section of this article.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

Downloads

BLOG_0030_External_Tables_Examples.zip

How to create a database trigger

Hi,

In today’s article I will explain how you can create a trigger and what you can use it for. A trigger is a procedure within the database which is automatically executed by the database when an event takes place. I will focus on table triggers and there you have three events: INSERT, UPDATE, DELETE. That means when a row is inserted an event takes places and if you have a trigger defined for this event it is executed. What do you need this for? You can use this for several things. For example you want to fill some columns based on other columns. Or you want to log something in another table.

In my following example I will create two tables. One with the data itself and some kind of logging table. If I insert a new row to the table I will also add a row to the logging table with a timestamp and the action performed.

First we need the two tables:

CREATE TABLE TBL_TRIGGER_TEST(
    ROW_ID NUMBER,
    ROW_VALUE NUMBER,
    ROW_COMMENT VARCHAR2(1000)
);
 
CREATE TABLE TBL_TRIGGER_TEST_LOG(
    ROW_ID NUMBER,
    ROW_VALUE NUMBER,
    ROW_COMMENT VARCHAR2(1000),
    ROW_TS DATE,
    ROW_ACTION VARCHAR2(1)
);

The table TBL_TRIGGER_TEST is the main table. We have a ROW_ID, a ROW_VALUE and a ROW_COMMENT, just to have some columns. In the table TBL_TRIGGER_TEST_LOG we will log all actions on the main table. In the case of the insertion of a new record we will insert a copy of it with date & time to the logging table. In the ROW_ACTION column we will insert an I (= INSERT).

Now we need the trigger:

CREATE OR REPLACE TRIGGER TRG_TEST_INS BEFORE INSERT
ON TBL_TRIGGER_TEST FOR EACH ROW
BEGIN
  INSERT INTO TBL_TRIGGER_TEST_LOG
  VALUES(:new.ROW_ID, :new.ROW_VALUE, :new.ROW_COMMENT, sysdate, ‘I’);
END;

A trigger is created via the CREATE OR REPLACE TRIGGER statement. If a trigger with the same name already exists it is overwritten. The next important thing is BEFORE. That defines when the trigger should be executed before the DML operation takes place or after. In the case of after you just write AFTER instead of BEFORE. INSERT is the kind of DML event. You can choose between INSERT, DELETE and UPDATE. When you select UPDATE you can also specify a column. Then you defined with ON followed by the table name on what table this trigger should be implemented. The FOR EACH ROW option defines that the trigger should be executed for each row that is inserted and not only once for all rows within the transaction.

After BEGIN the code itself starts. We are just inserting a new into the logging table. The first three columns will be filled with the values that are to be inserted in the main table. You can access them via :new. This is something like a pseudo row. Beside :new you can use :o ld to access the old values. That is important when you delete a row or update it.

You are not allowed to put a COMMIT in a trigger. The trigger is commited by the transaction that inserted a row in the main table. That’s why it is not needed here.

Now you can test the trigger. Just insert a new row to the main table and commit it. Afterwards you can show all rows from the logging table. You should find an exact copy of the row you’ve inserted to the main table and the timestamp should match the date/time when you’ve inserted it.

As a task you can try to implement a trigger for the delete event. When a row is deleted you want a copy in the logging table. The action flag should contain a ‘D’. The examples and the result to this task can be found in the download area. Have fun!

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

Downloads

BLOG_0029_Create_Trigger_examples_&_task.zip

Group By extensions IV: Functions

Hi,

This is the last part of my article series on SQL’s Group By extensions. In the first part we dealt with the ROLLUP statement, the second part then covered the CUBE statement and in the third part we discovered flexibility with the GROUPING SETS. Today we will take a look at two functions that are sometimes used to filter some of the auto-generated sums of ROLLUP and CUBE. The names of the functions are:

  • GROUPING(<expression>)
  • GROUPING_ID(<expression1>, <expression2>, …, <expression n>)

The examples are based on the test data of part 2, i.e. the table looks something like this:

Day Product No Product Name Turnover
01.01.2017 1 Product 1 20
2 Product 2 30
3 Product 3 40
02.01.2017 1 Product 1 30
2 Product 2 40
3 Product 3 50
….
30.04.2017 3 Product 3 330

 

The GROUPING() function is used to check whether an expression was used in the grouping or not. Here comes a small example for that:

SELECT  to_char(DAY, ‘YYYY-Q’) AS QUARTER_ID,
              to_char(DAY, ‘YYYY-MM’) AS MONTH_ID,
              SUM(TURNOVER) AS TURNOVER,
              GROUPING(to_char(DAY, ‘YYYY-Q’)) AS GRP_QUARTER,
              GROUPING(to_char(DAY, ‘YYYY-MM’)) AS GRP_MONTH
     
FROM TBL_TEST
GROUP BY ROLLUP(to_char(DAY, ‘YYYY-Q’), to_char(DAY, ‘YYYY-MM’))
ORDER BY QUARTER_ID, MONTH_ID;
 

Result:

BLOG_0023_PIC01_Result_Query_1

In this example you can see how the GROUPING() function basically works. For the most detailed level (I.e. month) both functions return 0 as the month_id and the quarter_id columns are grouped by. For the quarter sums the month_id is not grouped by and therefore the GROUPING() function returns 1 for this column. For the total sum both id-columns are not in the GROUP BY and that leads to a result of 1 for both GROUPIN() functions.

You can use these results to decide what sums you want to keep or remove. For that you usually need these functions when you use CUBE or ROLLUP. With GROUPING SETS you normally don’t have the need for these functions as you can exactly define what sums you need.

The GROUPING_ID() function is the combination of several GROUPING() functions in the way that if you take several GROUPING results (which are always 1 or 0 = binary) and put them in a row you receive a binary string which can be interpreted as a decimal number and that one is returned by GROUPING_ID. You receive a different ID for each different kind of sum. You then only have one column with the ID instead of several columns as with the GROUPING() function. Now here is an example for that:

SELECT  to_char(DAY, ‘YYYY-Q’) AS QUARTER_ID,
              to_char(DAY, ‘YYYY-MM’) AS MONTH_ID,
              SUM(TURNOVER) AS TURNOVER,
              GROUPING(to_char(DAY, ‘YYYY-Q’)) AS GRP_QUARTER,
              GROUPING(to_char(DAY, ‘YYYY-MM’)) AS GRP_MONTH,
              GROUPING_ID(to_char(DAY, ‘YYYY-Q’), to_char(DAY, ‘YYYY-MM’)) AS GRP_ID
 
FROM TBL_TEST
GROUP BY ROLLUP(to_char(DAY, ‘YYYY-Q’), to_char(DAY, ‘YYYY-MM’))
ORDER BY QUARTER_ID, MONTH_ID;

Result:

BLOG_0023_PIC02_Result_Query_2

 

I’ve take the first example and added only the last row in the SELECT. This is the GROUPING_ID() function which takes several expressions. In this example we put the expressions of the QUARTER_ID and MONTH_ID columns. The order of the expressions is important. It defines on what position of the binary string the result of the GROUPING of the column is put. If you take a look at row number 4. The GROUPING of Quarter is 0 and the GROUPING of Month is 1. As an binary string this would be 01 and if you transform it to a decimal number it’s: 0*2+1*2° = 0 + 1 = 1. If we check line no. 7 we have 11 as a binary string and decimal it’s: 1*2 + 1*2° = 2 + 1 = 3.

Now another example for the GROUPING():

SELECT  to_char(DAY, ‘YYYY-Q’) AS QUARTER_ID,
              to_char(DAY, ‘YYYY-MM’) AS MONTH_ID,
              DAY,
              SUM(TURNOVER) AS TURNOVER,
              GROUPING(to_char(DAY, ‘YYYY-Q’)) AS GRP_QUARTER,
              GROUPING(to_char(DAY, ‘YYYY-MM’)) AS GRP_MONTH,
              GROUPING(DAY) AS GRP_DAY,
              GROUPING_ID(to_char(DAY, ‘YYYY-Q’), to_char(DAY, ‘YYYY-MM’),
                                      DAY) AS GRP_ID     
 
FROM TBL_TEST
WHERE to_char(DAY,’DD’) IN (’01′, ’02′, ’03′)
GROUP BY ROLLUP(to_char(DAY, ‘YYYY-Q’), to_char(DAY, ‘YYYY-MM’), DAY)
ORDER BY QUARTER_ID, MONTH_ID, DAY;

Result:

BLOG_0023_PIC03_Result_Query_3

This example is based on the second example. In addition I’ve added the DAY column in the SELECT and ROLLUP. Also I’ve added a GROUPING() for DAY and added it to the GROUPING_ID() function. A filter in the WHERE clause just limit the rows in the result. It’s pretty much the same as in the second example. But you have one column more in this example. The daily sums have a GROUPING_ID of 0, the monthly SUMS of 1, the quarterly sums of 3 and the total sum a GROUPING_ID of 7. The 7 is calculated like this: 111 = 1*2² + 1*2 + 1*2° = 4 + 2 + 1 = 7.

I hope it became clear a little how the functions GROUPING() and GROUPING_ID() works and what they can be used for. These functions can be used in the HAVING clause to filter out some sums. Also these functions can be used with CUBE and also GROUPING SETS.

The examples used in this article can be found at the very end of this article as zip-file. I will continue this series soon and we will then deal with some useful functions you can use if you are working with the rollup or cube statement.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

 

Downloads

BLOG_0023_Testdata_&_Examples.zip

Group By extensions I: ROLLUP

In todays article I will deal with an extension of the classical Group By clause. The Group By clause is used to aggregate data to a specific level of granularity. Let’s say the data in the database is daily data and you need a report on monthly level. Then you have to aggregate the data, i.e. you have to sum up all lines that belong to the same month. The result will be that you have one line per month. For that you would introduce a GROUP BY for defining the needed level of granularity (in this case: monthly level). Furthermore you would use the SUM() aggregation function to sum up the kpi.

That all works fine so far but it has one limitation: You can only have one GROUP BY per SQL statement and therefore you can only aggregate the data to one level of granularity within one query. What if you need monthly data and quarterly data in one query? Then you would use two queries and the set operation UNION ALL. Or after this article you will use the Group By extension ROLLUP. This has two advantages:

  1. Performance
  2. One additional clause vs. another query + UNION ALL

In the first approach (second query + UNION ALL) the tables are accessed several times (depending on how many queries are used). That can lead to bad performance. Furthermore the sql will grow because you have several queries. When using the Group By extension you can see in the execution plan that the tables are only accessed once. And the SQL isn’t growing that much because all you have to do is to extent the GROUP BY clause.

Okay, let’s write some code. Given the following table (see the zip-file for a creation script):

Day (DD.MM.YY)
TURNOVER
01.01.17 1000
1000
31.01.17 1000
01.02.17 1200
1200
28.02.17 1200
01.03.17 1500
1500
30.04.17 1500

 

Now we want to create a query with monthly data and a total over everything. Without the Group By extension it would look something like this:

SELECT  to_char(DAY,’MM/YYYY’) AS MONTH,
             SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST
GROUP BY to_char(DAY,’MM/YYYY’) 
 
UNION ALL
 
SELECT  ‘TOTAL 2017′ AS MONTH,
              SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST;

Result:

BLOG_0019_PIC01_Result_Query1

As you can see we have two queries that are connected via UNION ALL. The first sql creates the monthly totals. The second one creates the query total.

With ROLLUP the same query would look like this:

SELECT  nvl(to_char(DAY,’MM/YYYY’),’TOTAL 2017′) AS MONTH,
            SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST
GROUP BY ROLLUP (to_char(DAY,’MM/YYYY’))

Result:

BLOG_0019_PIC01_Result_Query1

The result is exactly the same as in the first query. But this one is a little shorter than the first one. In the select you put all columns you need to distinguish the totals. In our case we have the month and the turnover total. The Group By is extended by ROLLUP ( …. ). Within the brackets you put one or more expressions. They have to be consistent to the SELECT clause. The ROLLUP clause is creating rollups along some kind of dimension. In this dimensions the levels are defined through the expressions. Totals over all is added automatically. This means in our case that we just put the expression for the month (the same as in the select) in the ROLLUP and the total over all is added automatically.

Now we want to create a more complex example. We are adding totals for the quarter and the year. That makes four different totals (month, quarter, year, query). Our query looks like this one:

SELECT  to_char(DAY,’YYYY’) AS YEAR,
             ‘Q’||to_char(DAY,’Q/YYYY’) AS QUARTER,
             to_char(DAY,’MM/YYYY’) AS MONTH,
             SUM(TURNOVER) AS TURNOVER
FROM TBL_TEST
GROUP BY ROLLUP ( to_char(DAY,’YYYY’),
                                ‘Q’||to_char(DAY,’Q/YYYY’),
                                 to_char(DAY,’MM/YYYY’)
)

Result:

BLOG_0019_PIC02_Result_Query3

For each total we added one expression to define the level. The ROLLUP clause is processed from the right to the left, i.e. first all columns will be grouped by, then the very right one is removed and the group by only consists of quarter and year, then the next one is removed from the right and so on. In that way all the sums are created.

Sometimes you have the need to display two columns for a level cause they belong logically together, e.g. product no and product name. If you just add both to the ROLLUP you receive two total lines for the product. What you can do is to put these two columns in brackets within the ROLLUP clause like this: ROLLUP( bla, (Product_No, Product_Name), … )

We call this composite columns. If they are put in brackets they behave as one total line.

The ROLLUP is available on Oracle, MS SQL Server, DB2, Postgres, etc. Unfortunately it’s not available on MySQL. I hope you enjoyed today’s lesson. The examples used in this article can be found at the very end of this article as zip-file. I will continue this series soon and we will then deal with the CUBE statement.

If you are interested in this or other advanced Cognos/SQL topics you can also attend training on my corporate homepage. At the moment my open courses are only available in German. Inhouse training and online courses are also available in English language. So check out my open trainings (German) or my online courses (English)!

Download

BLOG_0019_GroupBy_Ext_ROLLUP_examples.zip