Total Pageviews

Saturday, April 6, 2013

SQL Server 2012 AlwaysON: "Automatic Page Repair" - how does it work?

SQL Server 2012 brought us a new HADR (high availability disaster recovery) feature and declares that database mirroring will be deprecated in future versions. Like database mirroring, AlwaysON has a similar feature: “Automatic Page Repair”. In the “AlwaysON Availability Groups”, if a participating database in any replica became a victim of page level issue (logical consistency) then that page will be automatically repaired by a non- affected page from another replica. A background service does this repair task once SQL Server Database Engine encounters an error similar to SQL Server detected a logical consistency-based I/O error.

In this article, we will be testing “Automatic Page Repair” functionality of SQL Server 2012 in the Windows 2012 environment. To test this feature we have the following:

1.       The entire environment is running inside VMware ESXi 5.0
2.       We have two Nodes Windows 2012 Datacenter Failover Cluster (WSFC)
3.       Two nodes SQL Server 2012 instance participating in WSFC.
4.       One is read-write "primary replica" and another is readable "secondary replica".
5.       Failover mode is “Automatic” and data synchronization mode is “synchronous”

In our testing, we will manually corrupt a primary read-write database to see how “Automatic Page Repair” functionality helps us to recover our database. To corrupt our test database we will be using a HexEditor XVI32 which can be downloaded from http://www.chmaas.handshake.de/.

A brief about our AlwaysON:

1.         ”BlogHAGroup” is the AlwaysON Availability Groups.
2.         SQL12K1 is the primary replica.
3.         Our “TestDB” is the only database participating in AlwaysON Availability Groups and resides in SQL12K1.
4.         SQL12K2 is the secondary replica and where the “TestDB” is read only.

Screenshots of our AlwaysON Setup:


Windows 2012 Failover Cluser with AlwaysON Availability Groups:

AlwaysON Availability Groups in SSMS 2012:



Let’s Get Started:

1.       Let’s create a table “Country” in the primary replica and insert some data into this table.

CREATE TABLE [Country]
    (
      [xID] [int] IDENTITY(1, 1)
                  NOT NULL ,
      [sCountry] [varchar](50) NOT NULL ,
      [sAbbrev] [varchar](5) NOT NULL
    )

INSERT  INTO Country
        ( sCountry, sAbbrev )
VALUES  ( 'Canada', 'CA' ),
        ( 'United States', 'US' ),
        ( 'Afghanistan', 'AF' ),
        ( 'Albania', 'AL' ),
        ( 'Algeria', 'DZ' ),
        ( 'Andorra', 'AD' ),
        ( 'Angolia', 'AO' )


2.       Query the primary and secondary replica. Both table schema and the data have been synchronized (figure #1).

3.       To simulate a database disaster and  to corrupt the database we need to do the following:
(a)    We stop SQL Server Service in SQL12K1. Please note that in AlwaysON mode, a participating database in an “availability group” can’t be offline.
(b)   Go to the database folder and provide the appropriate OS permission. And then use HexEditor to open the data file (figure #2).
(c)    Let’s search the string “Andorra” and on the left hand side, type any number and then save the changes.
(d)   Start SQL Server Service on SQL12K1 which is our primary read-write replica.

4.       Execute a SELECT statement on primary replica (SQL12K1) and we will receive the following error:

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x2f2c9f8a; actual: 0xb9b4fc2b). It occurred during a read of page (1:231) in database ID 5 at offset 0x000000001ce000 in file 'H:\HADB\TestDB.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

5.       Query the following table and DMV, and then we will see the following page related issue recorded in the table (figure #3).
     select * from msdb..suspect_pages
     select * from sys.dm_hadr_auto_page_repair
 
6.       Wait for a few seconds to let the SQL Server Engine repair the corrupt page then execute the SELECT statement against “country” table again. The corrupted page has been rectified!

Figure # 1: Table and data


Figure # 2: Manually damaging a page with HexEditor

Figure # 3: Suspect pages
 
Summary:
As we saw the AlwaysON service is an excellent feature in SQL Server 2012 which provides maximum benefits for “High Availability” than ever before and “Automatic Page Repair” protects the database in instantly.

1 comment:

  1. I have couple of questions.
    1.In primary server ,when i executed DBCC CHECKDB,it returned some errors.Will the secondary have the same errors?(will there be any sync between primary and secondary related to corruption)?
    2.In always on configured,When this errors are fixed in Primary,will this sync in secondary and automatically fix ?
    Thanks in advance
    Ravi

    ReplyDelete