RFI Example

Here is an example of recovery using RFI's page level features:

During recovery of a database, five pages were marked suspect. The DBA examined the pages and determined that three of them are index pages on a single allpages-locked (APL) table, and that the other two marked suspect are data pages belonging to different tables. The database has been marked as read_only and while users can query the database, no changes can currently take place.

First the DBA onlines the pages with the sa_on option. The DBA then immediately dumps the transaction log to ensure the ability to recover to this point should something else go wrong. Recovery would involve loading a database dump and all subsequent transaction dumps.

Before RFI, a dump of a suspect database was not possible. With RFI, the DBA can make a dump of the slightly corrupt database in case it is needed later. Often a recent dump with a few problems is preferable to an older dump with no problems. This is purely a safety measure as the DBA hopes to be able to repair the database, which is currently partly unrecovered.

Next the DBA runs dbcc indexalloc on the index containing the three offline pages. indexalloc reveals errors, and it is decided that the best thing to do is to rebuild the index. If the index was a nonclustered index, or a data-only-locked (DOL) clustered (placement) index, it could simply be dropped and recreated. However, this is an APL clustered index and any time the clustered index is suspect, the table is suspect as well. The DBA runs dbcc checktable to examine the integrity of the data pages. dbcc checktable always checks the data page linkage before checking the index structures. (Keep in mind that a DOL table's data and non-leaf index pages do not maintain sibling links that can be followed by dbcc pglinkage type of checks.) By looking at the output of dbcc checktable, the DBA determines that the data page linkage is intact. This means that it is safe to drop the clustered index.

NoteIf the data page linkage also showed corruption, the DBA would have to resort to backups or find another way to restore the table (an offline bcp copy, for example).

Looking at the data pages for the other two objects, it is found that the first object is a static reference table, and an offline copy of this table's data exists. The DBA decides to truncate the table and bcp in a new copy. For the second object, an APL clustered table, the data page linkage is found to be broken, but the clustered index is still intact. With this information the DBA is able to locate all of the rows, bcp them out, truncate the table and bcp them back in.

Once all of these tasks are complete, the question of possible incomplete logical changes to the tables due to incomplete transactions still remains. The only way to test for data integrity is to use user-written queries and reports that expose inconsistencies. After doing this, the DBA can determine if those inconsistencies can be tolerated, or repaired, or if backups are the best option.

The final step is to detect and fix any allocation inconsistencies that may exist due to recovery having only partially completed. The DBA can run dbcc checkalloc to check the entire database, or dbcc tablealloc and dbcc indexalloc can be run on the suspect objects.

From this example it is clear that Recovery Fault Isolation makes many more choices available to the DBA. With database-only granularity, the DBA has no way to examine the extent of the corruption and make a decision as to what the best solution to the failed recovery might be.