Oracle write consistency bug and multi-thread de-queuing

Here is a quick test I did after encountering an abnormal behavior in write consistency and before finding some references to a bug on StackOverflow (yes, write consistency questions on StackOverflow!) and AskTOM. And a bug opened by Tom Kyte in 2011, that is still there in 18c.

The original issue was with a task management system to run jobs. Here is the simple table where all rows have a 'NEW' status and the goal is to have several threads processing them by updating them to the 'HOLDING' status' and adding the process name.

 

set echo on
drop table DEMO;
create table DEMO (ID primary key,STATUS,NAME,CREATED)
 as select rownum,cast('NEW' as varchar2(10)),cast(null as varchar2(10)),sysdate+rownum/24/60 from xmltable('1 to 10')
/

 

Now here is the query that selects the 5 oldest rows in status 'NEW' and updates them to the 'HOLDING' status:

 

UPDATE DEMO SET NAME = 'NUMBER1', STATUS = 'HOLDING' 
WHERE ID IN (
 SELECT ID FROM (
  SELECT ID, rownum as counter 
  FROM DEMO 
  WHERE STATUS = 'NEW' 
  ORDER BY CREATED
 ) 
WHERE counter <= 5) 
;

 

Note that the update also sets the name of the session which has processed the rows, here 'NUMBER1'.

Once the query started, and before the commit, I've run the same query from another session, but with 'NUMBER2'.

 

UPDATE DEMO SET NAME = 'NUMBER2', STATUS = 'HOLDING' 
WHERE ID IN (
 SELECT ID FROM (
  SELECT ID, rownum as counter 
  FROM DEMO 
  WHERE STATUS = 'NEW' 
  ORDER BY CREATED
 ) 
WHERE counter <= 5) 
;

 

Of course, this waits on row lock from the first session as it has selected the same rows. Then I commit the first session, and check, from the first session what has been updated:

 

commit;
set pagesize 1000
select versions_operation,versions_xid,DEMO.* from DEMO versions between scn minvalue and maxvalue order by ID,2;

V VERSIONS_XID             ID STATUS     NAME       CREATED        
- ---------------- ---------- ---------- ---------- ---------------
U 0500110041040000          1 HOLDING    NUMBER1    27-SEP-18 16:48
                            1 NEW                   27-SEP-18 16:48
U 0500110041040000          2 HOLDING    NUMBER1    27-SEP-18 16:49
                            2 NEW                   27-SEP-18 16:49
U 0500110041040000          3 HOLDING    NUMBER1    27-SEP-18 16:50
                            3 NEW                   27-SEP-18 16:50
U 0500110041040000          4 HOLDING    NUMBER1    27-SEP-18 16:51
                            4 NEW                   27-SEP-18 16:51
U 0500110041040000          5 HOLDING    NUMBER1    27-SEP-18 16:52
                            5 NEW                   27-SEP-18 16:52
                            6 NEW                   27-SEP-18 16:53
                            7 NEW                   27-SEP-18 16:54
                            8 NEW                   27-SEP-18 16:55
                            9 NEW                   27-SEP-18 16:56
                           10 NEW                   27-SEP-18 16:57

 

I have used flashback query to see all versions of the rows. All 10 have been created and the the first 5 of them have been updated by NUMBER1.

Now, my second session continues, updating to NUMBER2. I commit and look at the row versions again:

 

commit;
set pagesize 1000
select versions_operation,versions_xid,DEMO.* from DEMO versions between scn minvalue and maxvalue order by ID,2;
V VERSIONS_XID             ID STATUS     NAME       CREATED        
- ---------------- ---------- ---------- ---------- ---------------
U 04001B0057030000          1 HOLDING    NUMBER2    27-SEP-18 16:48
U 0500110041040000          1 HOLDING    NUMBER1    27-SEP-18 16:48
                            1 NEW                   27-SEP-18 16:48
U 04001B0057030000          2 HOLDING    NUMBER2    27-SEP-18 16:49
U 0500110041040000          2 HOLDING    NUMBER1    27-SEP-18 16:49
                            2 NEW                   27-SEP-18 16:49
U 04001B0057030000          3 HOLDING    NUMBER2    27-SEP-18 16:50
U 0500110041040000          3 HOLDING    NUMBER1    27-SEP-18 16:50
                            3 NEW                   27-SEP-18 16:50
U 04001B0057030000          4 HOLDING    NUMBER2    27-SEP-18 16:51
U 0500110041040000          4 HOLDING    NUMBER1    27-SEP-18 16:51
                            4 NEW                   27-SEP-18 16:51
U 04001B0057030000          5 HOLDING    NUMBER2    27-SEP-18 16:52
U 0500110041040000          5 HOLDING    NUMBER1    27-SEP-18 16:52
                            5 NEW                   27-SEP-18 16:52
                            6 NEW                   27-SEP-18 16:53
                            7 NEW                   27-SEP-18 16:54
                            8 NEW                   27-SEP-18 16:55
                            9 NEW                   27-SEP-18 16:56
                           10 NEW                   27-SEP-18 16:57

 

This is not what I expected. I wanted my second session to process the other rows, but here it seems that it has processed the same rows as the first one. What has been done by the NUMBER1 has been lost and overwritten by NUMBER2. This is inconsistent, violates ACID properties, and should not happen. An SQL statement must ensure write consistency: either by locking all the rows as soon as they are read (for non-MVCC databases where reads block writes), or re-starting the update when a mutating row is encountered. Oracle default behaviour is in the second case, NUMBER2 query reads the rows 1 to 5, because the changes by NUMBER1, not committed yet, are invisible from NUMBER2. But the execution should keep track of the columns referenced in the where clause. When attempting to update a row, now that the concurrent change is visible, the update is possible only if the WHERE clause used to select the rows still selects this row. If not, the database should raise an error (this is what happens in serializable isolation level) or re-start the update when in the default statement-level consistency.

Here, probably because of the nested subquery, the write consistency is not guaranteed and this is a bug.

One workaround is not to use subqueries. However, as we need to ORDER BY the rows in order to process the oldest first, we cannot avoid the subquery. The workaround for this is to add STATUS = 'NEW' in the WHERE clause of the update, so that the update restart works correctly.

However, the goal of multithreading those processes is to be scalable, and multiple update restarts may finally serialize all those updates.

The preferred solution for this is to ensure that the updates do not attempt to touch the same rows. This can be achieved by a SELECT ... FOR UPDATE SKIP LOCKED. As this cannot be added directly to the update statement, we need a cursor. Something like this can do the job:

 

declare counter number:=5;
begin
 for c in (select /*+ first_rows(5) */ ID FROM DEMO 
           where STATUS = 'NEW' 
           order by CREATED
           for update skip locked)
 loop
  counter:=counter-1;
  update DEMO set NAME = 'NUMBER1', STATUS = 'HOLDING'  where ID = c.ID and STATUS = 'NEW';
  exit when counter=0;
 end loop;
end;
/
commit;

 

This can be optimized further but just gives an idea of what is needed for a scalable solution. Waiting for locks is not scalable.

Comments

Interesting.

Just playing with a couple of options that you'd be well aware of anyway - add STATUS or NAME into the nested selects.

Or partition the table by status with row movement enabled which might have been a worthwhile consideration for such a table anyway?

Add new comment

You are here