1. Home
  2. Knowledge Base
  3. Database Administration
  4. Oracle 12c new feature APPROX_COUNT_DISTINCT and FETCH FIRST ROWS

Oracle 12c new feature APPROX_COUNT_DISTINCT and FETCH FIRST ROWS

If we are dealing with large amounts of data and have to use queries to find count of distinct values of some particular column or group of columns, the new 12.1.0.2 feature APPROX_COUNT_DISTINCT function can be significantly faster than using the traditional COUNT (DISTINCT expr) function with marginal differences in the exact result.

The examples below also show less optimizer cost overhead as well as temporary tablespace space usage as well.

We will also see how the traditional Top n queries we used to run has changed significantly in Oracle 12c with the FETCH FIRST n ROWS and OFFSET commands available in SQL*PLUS.

Let us run the query in the traditional way to determine the Top 5 products in terms of the number of individual customers.

Note the execution plan and the fact that sort operation has used some temporary space.

SQL> conn sh/sh
Connected.


SQL> select * from
  2  (select prod_id, count(distinct cust_id) from sales
  3  group by prod_id  order by 2 desc )
  4   where rownum < 6;


   PROD_ID COUNT(DISTINCTCUST_ID)
---------- ----------------------
        30                   6154
        48                   6010
        40                   5972
        31                   5586
       130                   5428


Execution Plan
----------------------------------------------------------
Plan hash value: 1492996759

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |     5 |   130 |       |  2654   (2)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY            |           |       |       |       |            |          |       |       |
|   2 |   VIEW                    |           |    72 |  1872 |       |  2654   (2)| 00:00:01 |       |       |
|*  3 |    SORT ORDER BY STOPKEY  |           |    72 |  1224 |       |  2654   (2)| 00:00:01 |       |       |
|   4 |     HASH GROUP BY         |           |    72 |  1224 |       |  2654   (2)| 00:00:01 |       |       |
|   5 |      VIEW                 | VM_NWVW_1 |   359K|  5966K|       |  2637   (2)| 00:00:01 |       |       |
|   6 |       HASH GROUP BY       |           |   359K|  3158K|    17M|  2637   (2)| 00:00:01 |       |       |
|   7 |        PARTITION RANGE ALL|           |   918K|  8075K|       |   514   (1)| 00:00:01 |     1 |    28 |
|   8 |         TABLE ACCESS FULL | SALES     |   918K|  8075K|       |   514   (1)| 00:00:01 |     1 |    28 |

Now we run the same query using the 12c new features - APPROX_COUNT_DISTINCT and FETCH FIRST n ROWS

Note the difference in the execution plan and optimizer cost. We see two new operations - WINDOW SORT PUSHED RANK and HASH GROUP BY APPROX

The results are almost 98% accurate with the APPROX_COUNT_DISTINCT function in use.


SQL> select prod_id, APPROX_COUNT_DISTINCT (cust_id)  from sales
  2  group by prod_id  order by 2 desc
  3  FETCH FIRST 5 ROWS ONLY;

   PROD_ID APPROX_COUNT_DISTINCT(CUST_ID)
---------- ------------------------------
        30                           6134
        48                           5883
        40                           5858
        31                           5463
       130                           5336


Execution Plan
----------------------------------------------------------
Plan hash value: 723256909

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     5 |   260 |   559   (9)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY            |       |     5 |   260 |   559   (9)| 00:00:01 |       |       |
|*  2 |   VIEW                    |       |     5 |   260 |   558   (9)| 00:00:01 |       |       |
|*  3 |    WINDOW SORT PUSHED RANK|       |    72 |   648 |   558   (9)| 00:00:01 |       |       |
|   4 |     HASH GROUP BY APPROX  |       |    72 |   648 |   558   (9)| 00:00:01 |       |       |
|   5 |      PARTITION RANGE ALL  |       |   918K|  8075K|   514   (1)| 00:00:01 |     1 |    28 |
|   6 |       TABLE ACCESS FULL   | SALES |   918K|  8075K|   514   (1)| 00:00:01 |     1 |    28 |


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=5)
   3 - filter(ROW_NUMBER() OVER ( ORDER BY APPROX_COUNT_DISTINCT("CUST_ID") DESC )<=5)

We can also use the OFFSET clause to now do some pagination and obtain the next 5 rows (after the top 5)

SQL> select prod_id, APPROX_COUNT_DISTINCT (cust_id)  from sales
  2   group by prod_id  order by 2 desc
  3  OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY;

   PROD_ID APPROX_COUNT_DISTINCT(CUST_ID)
---------- ------------------------------
        33                           5278
       120                           5135
       128                           5135
       133                           5114
        23                           5104

We can also use the PERCENT ROWS clause to return the top percentage of rows we desire

SQL>  select prod_id, count(distinct cust_id) from sales
  2   group by prod_id  order by 2 desc
  3  FETCH FIRST 10 PERCENT ROWS ONLY;

   PROD_ID COUNT(DISTINCTCUST_ID)
---------- ----------------------
        30                   6154
        48                   6010
        40                   5972
        31                   5586
       130                   5428
        33                   5389
       120                   5224
       133                   5201

8 rows selected.

When we ran all these queries these were the runtime statistics – almost the same in every case.

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       1623  consistent gets
       1607  physical reads
          0  redo size
        732  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed

Since the database is 12.1.0.2 and we have enabled IN MEMORY option in the database let us what happens to the same query when we use the In Memory column store.

Note the difference in the optimizer cost, execution plan and 0 physical reads – In Memory option in operation which requires no change in the application or query!

SQL> alter table sales inmemory;

Table altered.

SQL> select prod_id, APPROX_COUNT_DISTINCT (cust_id)  from sales
  2   group by prod_id  order by 2 desc
  3  FETCH FIRST 5 ROWS ONLY;

   PROD_ID APPROX_COUNT_DISTINCT(CUST_ID)
---------- ------------------------------
        30                           6134
        48                           5883
        40                           5858
        31                           5463
       130                           5336



Execution Plan
----------------------------------------------------------
Plan hash value: 723256909

---------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |       |     5 |   260 |   154  (33)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                  |       |     5 |   260 |   154  (33)| 00:00:01 |       |       |
|*  2 |   VIEW                          |       |     5 |   260 |   153  (33)| 00:00:01 |       |       |
|*  3 |    WINDOW SORT PUSHED RANK      |       |    72 |   648 |   153  (33)| 00:00:01 |       |       |
|   4 |     HASH GROUP BY APPROX        |       |    72 |   648 |   153  (33)| 00:00:01 |       |       |
|   5 |      PARTITION RANGE ALL        |       |   918K|  8075K|   110   (6)| 00:00:01 |     1 |    28 |
|   6 |       TABLE ACCESS INMEMORY FULL| SALES |   918K|  8075K|   110   (6)| 00:00:01 |     1 |    28 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=5)
   3 - filter(ROW_NUMBER() OVER ( ORDER BY APPROX_COUNT_DISTINCT("CUST_ID") DESC )<=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
        732  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed


Updated on June 2, 2021

Was this article helpful?

Related Articles

Leave a Comment