Friday, December 14, 2012

Suspending RI, At Least For A Little While

Question: How do I temporarily disable referential integrity (RI) checking in SQL Anywhere so I can insert rows out of parent-child order? SQL Server lets me use ALTER TABLE NOCHECK CONSTRAINT.

Answer: The easiest, safest and most efficient technique is to use the WAIT_FOR_COMMIT option to defer all foreign key checking until the next COMMIT. This technique is available in SQL Anywhere but not SQL Server.

The next best technique (fairly safe, not so easy, and really inefficient for large tables) is to ALTER TABLE DROP CONSTRAINT on the foreign key constraints that are getting in your way, and ALTER TABLE ADD CONSTRAINT when all the data has been inserted. This technique is available in both SQL Anywhere and SQL Server.

The least-safe technique uses ALTER TABLE NOCHECK CONSTRAINT to disable (not defer) foreign key checking until the work is done, then use ALTER TABLE CHECK CONSTRAINT to enable checking. This technique is available in SQL Server, but it is unlikely ever to be supported in SQL Anywhere (more on this later).

Compare And Contrast, By Example

Here's what a referential integrity violation looks like in SQL Anywhere, when a child row is inserted before the parent row exists:
CREATE TABLE parent (
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT parent_pkey PRIMARY KEY ( parent_key ) );

CREATE TABLE child (
   child_key    INTEGER NOT NULL,
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT child_pkey PRIMARY KEY ( child_key ),
   CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key ) );

INSERT child VALUES ( 99, 1, 'child' );
-----
Could not execute statement.
No primary key value for foreign key 'fkey' in table 'child'
SQLCODE=-194, ODBC 3 State="23000"
Line 29, column 1
INSERT child VALUES ( 99, 1, 'child' )
Here's what the problem looks like in SQL Server... same referential integrity violation, different message:
CREATE TABLE parent (
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT parent_pkey PRIMARY KEY ( parent_key ) )
GO

CREATE TABLE child (
   child_key    INTEGER NOT NULL,
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT child_pkey PRIMARY KEY ( child_key ),
   CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key ) )
GO

INSERT child VALUES ( 99, 1, 'child' )
GO

-----

Msg 547, Level 16, State 1, Server ENVY, Line 2
The INSERT statement conflicted with the FOREIGN KEY constraint "fkey". The
conflict occurred in database "test", table "dbo.parent", column 'parent_key'.
The statement has been terminated.
Here's how setting the WAIT_FOR_COMMIT option at the connection level solves the problem in SQL Anywhere; referential integrity checking is deferred until the COMMIT is executed, and by that time a matching row in the parent table has been inserted:
SET TEMPORARY OPTION WAIT_FOR_COMMIT = 'ON';
INSERT child VALUES ( 99, 1, 'child' );
INSERT parent VALUES ( 1, 'parent' );
COMMIT;

SET TEMPORARY OPTION WAIT_FOR_COMMIT = 'OFF';

SELECT * FROM parent;
SELECT * FROM child;

-----

parent_key,data
1,'parent'

child_key,parent_key,data
99,1,'child'
Here's what happens in SQL Anywhere when a referential integrity violation still exists when the COMMIT is issued; in other words, the data's bad and no amount of waiting is going to change that fact:
SET TEMPORARY OPTION WAIT_FOR_COMMIT = 'ON';
INSERT child VALUES ( 99, 2, 'child' );
INSERT parent VALUES ( 1, 'parent' );
COMMIT;

-----

Could not execute statement.
No primary key value for foreign key 'fkey' in table 'child'
SQLCODE=-194, ODBC 3 State="23000"
Line 32, column 1
COMMIT
Here's how ALTER TABLE DROP and ADD CONSTRAINT statements can be used to temporarily disable foreign key checking in SQL Anywhere, to allow a child row to be inserted before the corresponding parent:
CREATE TABLE parent (
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT parent_pkey PRIMARY KEY ( parent_key ) );

CREATE TABLE child (
   child_key    INTEGER NOT NULL,
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT child_pkey PRIMARY KEY ( child_key ),
   CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key ) );

ALTER TABLE child DROP CONSTRAINT fkey;

INSERT child VALUES ( 99, 1, 'child' );
INSERT parent VALUES ( 1, 'parent' );
COMMIT;

ALTER TABLE child ADD CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key );
The same ALTER TABLE DROP and ADD CONSTRAINT technique works in SQL Server:
CREATE TABLE parent (
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT parent_pkey PRIMARY KEY ( parent_key ) )
GO

CREATE TABLE child (
   child_key    INTEGER NOT NULL,
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT child_pkey PRIMARY KEY ( child_key ),
   CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key ) )
GO

ALTER TABLE child DROP CONSTRAINT fkey
GO

INSERT child VALUES ( 99, 1, 'child' )
GO
INSERT parent VALUES ( 1, 'parent' )
GO

ALTER TABLE child ADD CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key )
GO
SQL Anywhere checks all the rows when the ALTER TABLE ADD CONSTRAINT is executed. This doesn't prevent bad data from being inserted

... and committed

... and saved forever,

but it does raise a red flag by preventing the referential integrity constraint from being redefined... that's what "fairly safe" meant in the introduction:
ALTER TABLE child DROP CONSTRAINT fkey;

INSERT child VALUES ( 99, 2, 'child' );
INSERT parent VALUES ( 1, 'parent' );
COMMIT;

ALTER TABLE child ADD CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key );

-----

Could not execute statement.
No primary key value for foreign key 'fkey' in table 'child'
SQLCODE=-194, ODBC 3 State="23000"
Line 35, column 1
ALTER TABLE child ADD CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key )
The fact that ALTER TABLE ADD CONSTRAINT checks all the existing rows explains the earlier comment about ALTER TABLE ADD CONSTRAINT being "really inefficient for large tables": if you use this technique to insert a single row in a million-row table, the ALTER TABLE ADD CONSTRAINT will check all million rows, something the WAIT_FOR_COMMIT technique avoids.

The same behavior is true for SQL Server; the ALTER TABLE ADD CONSTRAINT doesn't allow the constraint to be redefined if any existing data fails the test... fairly safe, and possibly inefficient:
ALTER TABLE child DROP CONSTRAINT fkey
GO

INSERT child VALUES ( 99, 2, 'child' )
GO
INSERT parent VALUES ( 1, 'parent' )
GO

ALTER TABLE child ADD CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key )
GO

-----

Msg 547, Level 16, State 1, Server ENVY, Line 2
The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "fkey".
The conflict occurred in database "test", table "dbo.parent", column
'parent_key'.
And that brings us to SQL Server's ALTER TABLE CHECK and NOCHECK CONSTRAINT technique; yes, it allows valid data to be inserted out of order:
CREATE TABLE parent (
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT parent_pkey PRIMARY KEY ( parent_key ) )
GO

CREATE TABLE child (
   child_key    INTEGER NOT NULL,
   parent_key   INTEGER NOT NULL,
   data         VARCHAR ( 10 ) NOT NULL,
   CONSTRAINT child_pkey PRIMARY KEY ( child_key ),
   CONSTRAINT fkey FOREIGN KEY ( parent_key ) REFERENCES parent ( parent_key ) )
GO

ALTER TABLE child NOCHECK CONSTRAINT fkey
GO

INSERT child VALUES ( 99, 1, 'child' )
GO
INSERT parent VALUES ( 1, 'parent' )
GO

ALTER TABLE child CHECK CONSTRAINT fkey
GO

SELECT * FROM parent
GO
SELECT * FROM child
GO

-----

1> SELECT * FROM parent
2> GO
 parent_key  data
 ----------- ----------
           1 parent

(1 row affected)
1> SELECT * FROM child
2> GO
 child_key   parent_key  data
 ----------- ----------- ----------
          99           1 child

(1 row affected)

Sadly, it also allows bad data to be inserted, on a permanent basis, with no indication there's a problem; the ALTER TABLE CHECK CONSTRAINT just turns checking back on, it does not check existing rows for validity:
ALTER TABLE child NOCHECK CONSTRAINT fkey
GO

INSERT child VALUES ( 99, 2, 'child' )
GO
INSERT parent VALUES ( 1, 'parent' )
GO

ALTER TABLE child CHECK CONSTRAINT fkey
GO

SELECT * FROM parent
GO
SELECT * FROM child
GO

-----

1> SELECT * FROM parent
2> GO
 parent_key  data
 ----------- ----------
           1 parent

(1 row affected)
1> SELECT * FROM child
2> GO
 child_key   parent_key  data
 ----------- ----------- ----------
          99           2 child
That's why ALTER TABLE NOCHECK CONSTRAINT is unlikely to be introduced in SQL Anywhere: it's . . . just . . . not . . . safe.

When folks asked to temporarily disable referential integrity checking so they could insert rows in any order they wanted, they got WAIT_FOR_COMMIT and they were happy.

Since then, nobody's been asking for the ability to insert bad data... that happens often enough without new features to encourage bad behavior.


For another discussion of ALTER TABLE NOCHECK CONSTRAINT see Can you trust your constraints?

3 comments:

Justin Willey said...

There's no doubt that the WAIT_FOR_COMMIT is the best approach - the effect is confined to the job in hand, I'd be very nervous dropping and recreating constraints. However it's interesting to note the more helpful error message that SQLServer provides when you violate the constraint - specifying the conflicting table and column - it would be great if SQLA could be as (or more) helpful. Eg
You have tried to insert a row in to table "child", however the foreign key constraint "fkey" requires that the value for column "parent_key" exists in column "parent_key" of the table "parent"

Anonymous said...

WAIT_FOR_COMMIT still makes my skin crawl a little bit. Just sayin'.

Breck Carter said...

@jpo: IMO there is no reason to fear WAIT_FOR_COMMIT... it is not possible for an integrity violation to persist past the end of a transaction. Consider this: If a transaction consists of two SQL updates, by definition there is a "consistency violation" between the first and second update, so you are already living with temporary violations whether or not you use WAIT_FOR_COMMIT.