PK will contain values PK is poorly clustered. PK values of 1,2, and 3 are adjacent in the primary key index but one-million rows apart in the table. FK will contain values For the first round of testing, the column FK will be indexed with a simple b-tree index.
I include it here because it allows us to compare the cost of context-switches to the cost of updates. Update-wise, it looks as though it should perform the same as the Explicit Cursor Loop. The difference is that the Implicit Cursor internally performs bulk fetches, which should be faster than the Explicit Cursor because of the reduced context switches.
I generally recommend against it for high-volume updates because the SET sub-query is nested, meaning it is performed once for each row updated.
The biggest drawback to this method is readability. This needs a unique index on TEST1. PK in order to enforce key preservation. We are using the update-only version here. This is to keep the playing field level when comparing to the other methods, which also perform primary key lookups on the target table. A Hash join may or may not be faster, that's not the point - I could increase the size of the target TEST table to M rows and Hash would be slower for sure.
This is a feature of Oracle's parallel-enabled functions; they will apportion the rows of a single Ref Cursor amongst many parallel slaves, with each slave running over a different subset of the input data set. Here is the statement that calls the Parallel Enabled Table Function: Yeah, I know, it's nasty. There are 3 runs: The buffer cache is flushed and about 1 hour of unrelated statistics gathering has been used to age out the disk cache.
The buffer cache is flushed and the disk cache has been aged out with about mins of indexed reads. Timings indicate that the disk cache is still partially populated with blocks used by the query. The buffer cache is pre-salted with the table and blocks it will need. It should perform very little disk IO. Explicit Cursor Loop Implicit Cursor Loop Updateable Join View Amongst the non-parallel methods , context switches only make a significant and noticable difference with cached data.
With uncached data, the cost of disk reads so far outweighs the context switches that they are barely noticable. Context Switching - whilst important - is not really a game-changer.
There were no significant differences between the 6 non-parallel methods, however this is not to suggest that it is not important which one you choose. All of these benchmarks perform Primary Key lookups of the updated table, however it is possible to run methods 5 and 6 as hash joins with full table scans.
If the proportion of blocks updated is high enough, the hash join can make an enormous difference in the run time. See Appendix 1 for an example. Parallel updates are a game changer.
The reason for this is disk latency. So when an operation runs serially, it only uses a small proportion of the available resources. If there are no other jobs running then we get poor utilisation. The parallel methods 7 and 8 allow us to tap into these under-utilised resources.
Instead of issuing , disk IO requests one after the other, these methods allow say parallel threads to perform just sequential IO requests. Method 8, which is the equivalent of running many concurrent versions of Method 4 with different data, is consistently faster than Oracle's Parallel DML. This is worth exploring. Since the non-parallel equivalents Methods 4 and 6 show no significant performance difference, it is reasonable to expect that parallelising these two methods will yield similar results.
I ran some traces see Appendix 2 and found that the Parallel Merge was creating too many parallel threads and suffering from latch contention. The lesson here is that too much parallelism is a bad thing. For brevity, this time we'll just flush the buffer cache and run about 5 minutes worth of indexed reads to cycle the disk cache. It looks as though there is a small premium associated with checking the foreign key, although it does not appear to be significant.
It's worth noting that the parent table in this case is very small and quickly cached. A very large parent table would result in considerably greater number of cache misses and resultant disk IO. Foreign keys are often blamed for bad performance; whilst they can be limiting in some circumstances e.
The two things I hear most about Bitmap indexes is that: They are inappropriate for tables that undergo concurrent updates, and They are slow to update. Surely no comparison of update methods could possibly be complete without a test of Bitmap index maintenance.
The Deadlock error raised by Method 8 occurred because bitmap indexes are locked at the block-level, not the row level. With hundreds of rows represented by each block in the index, the chances of two sessions attempting to lock the same block are quite high. The very clear lesson here: Although not shown here, this effect is magnified with each additional bitmap index. Parallelize large updates for a massive performance improvement.
Tune the number of parallel query servers used by looking for latch contention thread startup waits. Don't rashly drop Foreign Keys without benchmarking; they may not be costing very much to maintain.
This makes it a good candidate for hash joins and full scans to out-perform indexed nested loops. Of course, as you decrease the percentage of blocks updated, the balance will swing in favour of Nested Loops; but this trace demonstrates that MERGE definitely has it's place in high-volume updates. Event waited on Times Max. Wait Total Waited Waited db file scattered read 37 0. Wait Total Waited Waited db file sequential read 0.
Although the number of physical disk blocks and Current Mode Gets are about the same in each test, the Hash Join method performs multi-block reads, resulting in fewer visits to the disk. If the proportion of updated blocks increases, then the average cost of finding those rows decreases; the exercise becomes one of tuning the data access rather than tuning the update. To shed some light, here are some traces. Below we see the trace from the Parallel Coordinator session of Method 7: Wait Total Waited Waited db file sequential read 1 0.
RO - fast object reuse 1 0. Join ACK 7 0. Parse Reply 15 0. Execute Reply 1. And here are the wait events for just ONE of the parallel threads from the same test case: Elapsed times include waiting on following events: Wait Total Waited Waited cursor: Execution Msg 16 1.
Msg Fragment 2 0. Table Q Normal 28 0. Wait Total Waited Waited reliable message 1 0. Join ACK 4 0. Parse Reply 46 0. Signal ACK 4 0. PS - contention 2 0. Execution Msg 8 0.
Msg Fragment 1 0. Whatever Oracle is doing here, it seems there is certainly a significant cost of opening parallel threads. In theory, we should be able to reduce the cost of thread startup and also reduce contention by reducing the number of parallel threads. Knowing from above that the parallel methods were time faster than the non-parallel methods, I suspect that benefits of parallelism diminish after no more than 32 parallel threads. First the Parallel Co-ordinator: Join ACK 9 0.
Parse Reply 18 0. Execution Msg 34 0. Table Q Normal 4 0.