Discngine

View Original

How to create an Oracle materialized view that performs well on fast refresh ?

Recently, working for a client, we faced an interesting challenge using Oracle database. It's been quite painful to gather scattered piece of information to get to the goal, so I thought I could summarize and share it here.

Let's imagine you have 20 million addresses, and you want to enrich them with the country they belong to. You have a country look up table. And you want to keep track of how the country has been enriched, so you need a dedicated table to store addresses' country (or you would violate normalization rules).

The core problem have been this: how to allow displaying AND modifying data stored in quite big tables (milions of records) with acceptable performance for the end user.

Let's go into it and create our model first!

We create and populate the address table

See this content in the original post

Then we create and populate the country look up table:

See this content in the original post

Finally we create and populate the enrichment table:

See this content in the original post

Then, selecting data is a bit slow:

See this content in the original post

Takes ~9s with my set up. 

Important remark: all addresses may not have been enriched, but you want to see them all, because the whole point of showing addresses in this context is to enrich the ones that have not been enriched yet. So you have to use outer joins in your select. Moreover, in real life, we have more columns to select and more tables to join so the performance is far from being acceptable.

So we can take advantage of materialized views:

See this content in the original post

Which takes ~25s with my personnal setup.

But for selecting:

See this content in the original post

50 records are fetched within 0.026s.

When the enrichment process is automated and executes overnight, it's perfectly acceptable. But in our case, we need to have a web application that allows user to manually enrich addresses, which is the reason why we want to display and modify data with good performance. Then 25s to refresh the materialized view is even worse than 5s to show the 50 first records.

Luckily for us, Oracle implemented a "fast refresh" mode for materialized views. In order to activate fast refresh, we have to create materialized view logs on the underlying tables. Let's create them:

See this content in the original post

Then we can enable fast refresh: 

See this content in the original post

But then, when we try to refresh:

See this content in the original post

This error is thrown:

See this content in the original post

Since we created the logs, why can't they be used ?

The reason is "the rowids of all the detail tables must appear in the SELECT list of the materialized view query definition", so we have to add them:

See this content in the original post

Which leads to another error:

See this content in the original post

This one has been a bit tricky to investigate ! This is due to the fact that fast-refresh does not support the Ansi join syntax (as shown on Ask Tom for instance). So we have to re-write our materialized view creation statement:

See this content in the original post

Now, let's update one record and refresh the materialized view:

See this content in the original post

 The refresh actions takes ~11 seconds on my laptop. Better, but still not satisfying.

Let's have a look at what happened. When looking at the trace we can see that the statement is this one:

See this content in the original post

And the explain plan says:

See this content in the original post

We'll address the full table access on the table country_adrdress later, but first we need to add indexes on enriched_addresses:

See this content in the original post

And then we update a value again and refresh the materialized view:

See this content in the original post

~5s. Better, but still too much.

The statement is the same, but the explain plan became:

See this content in the original post

This is the other weird behaviour which took me a while to understand. The Oracle knowledge database says "This problem is caused by unpublished Bug 4196039, "Inefficient plan causing poor incremental refresh for Materialized View", and for those not having access, the Ask Tom version, Alberto Dell’Era’s blog version or Dominic Brooks’ one.

So we gather materialized view stats while it’s empty and lock them:

See this content in the original post

And being connected as sys, we set the _mv_refresh_use_stats parameter to true:

See this content in the original post

Let's update data again:

See this content in the original post

Which finally takes 0.099s. Here we are ! 🚀

Becareful with the "_mv_refresh_use_stats”, apparently it can lead to problems in some cases, as mentionned on Dominique Brook’s blog.