Рабочая база в состоянии suspect, основано на реальных событиях

К счастью это обнаружилось в выходной день при удаленном подключении к рабочему месту. Рабочая база была недоступна и была помечена SQL сервером как подозрительная(suspect).

база в состоянии suspectбаза в состоянии suspect

Первым делом смотрим логи SQL сервера:


01/12/2013 00:00:44,spid17s,Unknown,This instance of SQL Server has been using a process ID of 1468 since 1/8/2013 10:38:53 PM (local) 1/8/2013 6:38:53 PM (UTC). This is an informational message only; no user action is required.
01/12/2013 01:03:31,spid53,Unknown,Error: 824 Severity: 24 State: 2.
01/12/2013 01:03:31,spid53,Unknown,SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7dcf9c24; actual: 0x7dcf1c24). It occurred during a read of page (1:28792043) in database ID 5 at offset 0x000036ea9d6000 in file 'C:\sql_data\Li######nie.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information see SQL Server Books Online.
01/12/2013 01:04:17,spid53,Unknown,During undoing of a logged operation in database 'Li######nie' an error occurred at log record ID (790584:69950:1). Typically the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup or repair the database.
01/12/2013 01:04:17,spid53,Unknown,During undoing of a logged operation in database 'Li######nie' an error occurred at log record ID (790620:84343:258). Typically the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup or repair the database.
01/12/2013 01:04:17,spid53,Unknown,Error: 3314 Severity: 21 State: 4.
01/12/2013 01:04:17,spid53,Unknown,Error: 3314 Severity: 21 State: 5.
01/12/2013 01:04:17,spid53,Unknown,Error: 824 Severity: 24 State: 2.
01/12/2013 01:04:17,spid53,Unknown,Error: 9001 Severity: 21 State: 1.
01/12/2013 01:04:17,spid53,Unknown,SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7dcf9c24; actual: 0x7dcf1c24). It occurred during a read of page (1:28792043) in database ID 5 at offset 0x000036ea9d6000 in file 'C:\sql_data\Li######nie.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information see SQL Server Books Online.
01/12/2013 01:04:17,spid53,Unknown,The log for database 'Li######nie' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
01/12/2013 01:04:20,spid22s,Unknown,SQL Server has encountered 1 occurrence(s) of cachestore flush for the 'Object Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
01/12/2013 01:04:25,spid22s,Unknown,SQL Server has encountered 1 occurrence(s) of cachestore flush for the 'Bound Trees' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
01/12/2013 01:04:25,spid22s,Unknown,SQL Server has encountered 1 occurrence(s) of cachestore flush for the 'SQL Plans' cachestore (part of plan cache) due to some database maintenance or reconfigure operations.
01/12/2013 01:04:26,spid22s,Unknown,Starting up database 'Li######nie'.
01/12/2013 01:04:30,spid22s,Unknown,Analysis of database 'Li######nie' (5) is 100% complete (approximately 0 seconds remain). This is an informational message only. No user action is required.
01/12/2013 01:04:30,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 0% complete (approximately 9593 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:04:50,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 0% complete (approximately 5851 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:04:58,Logon,Unknown,Error: 18456 Severity: 14 State: 16.
01/12/2013 01:04:58,Logon,Unknown,Login failed for user 'sa'. [CLIENT: 10.27.4.1]
01/12/2013 01:05:10,Logon,Unknown,Error: 18456 Severity: 14 State: 16.
01/12/2013 01:05:10,Logon,Unknown,Login failed for user 'sa'. [CLIENT: 10.27.4.1]
01/12/2013 01:05:10,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 0% complete (approximately 5840 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:05:31,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 1% complete (approximately 5843 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:05:51,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 1% complete (approximately 5826 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:06:11,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 1% complete (approximately 5814 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:06:12,spid22s,Unknown,2 transactions rolled forward in database 'Li######nie' (5). This is an informational message only. No user action is required.
01/12/2013 01:06:12,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 1% complete (approximately 5813 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
01/12/2013 01:06:13,spid22s,Unknown,Recovery of database 'Li######nie' (5) is 1% complete (approximately 5813 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
01/12/2013 01:06:14,spid22s,Unknown,An error occurred during recovery preventing the database 'Li######nie' (database ID 5) from restarting. Diagnose the recovery errors and fix them or restore from a known good backup. If errors are not corrected or expected contact Technical Support.
01/12/2013 01:06:14,spid22s,Unknown,During undoing of a logged operation in database 'Li######nie' an error occurred at log record ID (790620:84343:258). Typically the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup or repair the database.
01/12/2013 01:06:14,spid22s,Unknown,Error: 3314 Severity: 21 State: 1.
01/12/2013 01:06:14,spid22s,Unknown,Error: 3414 Severity: 21 State: 2.
01/12/2013 01:06:14,spid22s,Unknown,Error: 824 Severity: 24 State: 2.
01/12/2013 01:06:14,spid22s,Unknown,SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7dcf9c24; actual: 0x7dcf1c24). It occurred during a read of page (1:28792043) in database ID 5 at offset 0x000036ea9d6000 in file 'C:\sql_data\Li######nie.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information see SQL Server Books Online.
01/12/2013 01:08:43,Logon,Unknown,Error: 18456 Severity: 14 State: 16.
01/12/2013 01:08:43,Logon,Unknown,Login failed for user 'sa'. [CLIENT: 10.27.4.1]
01/12/2013 01:10:26,Logon,Unknown,Error: 18456 Severity: 14 State: 16.
01/12/2013 01:10:26,Logon,Unknown,Login failed for user 'sa'. [CLIENT: 10.27.4.1]
01/12/2013 01:13:42,Logon,Unknown,Error: 18456 Severity: 14 State: 16.
01/12/2013 01:13:42,Logon,Unknown,Login failed for user 'sa'. [CLIENT: 10.27.4.1]
01/12/2013 01:18:45,Logon,Unknown,Error: 18456 Severity: 14 State: 16.
01/12/2013 01:18:45,Logon,Unknown,Login failed for user 'sa'. [CLIENT: 10.27.4.1]


Где-то в 1:03 произошла ошибка, далее сервер попытался ее исправить, но у него, видимо, ничего не получилось и в 1:08 база стала недоступной. Тут появляется надежда на бакап, т.к. у нас они создаются ежедневно в 22:00. И, забегая вперед, именно восстановление бакапа было запущено с этого момента и именно этот вариант обхода проблемы был использован. Но вернемся к подозрительной БД.

Эта ошибка зафиксирована также в журнале ОС сервера:

журнал серверной ОСжурнал серверной ОС

Переводим базу в состояние Emergency:


ALTER DATABASE Li####enie SET EMERGENCY

Emergency modeEmergency mode

Устанавливаем монопольный режим:


ALTER DATABASE Li####enie SET SINGLE_USER WITH ROLLBACK IMMEDIATE


Запускаем проверку(на 200Гб базе она работала 28 минут):


DBCC CHECKDB ('Li####enie')


а вот и результаты:


Msg 8977, Level 16, State 1, Line 1
Table error: Object ID 1113471441, index ID 4, partition ID 72057686612705280, alloc unit ID 72057686504243200 (type In-row data). Parent node for page (1:2378173) was not encountered.

...

Table error: Object ID 1113471441, index ID 4, partition ID 72057686612705280, alloc unit ID 72057686504243200 (type In-row data), page (1:28792043). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 79693833 and -4.
Msg 8976, Level 16, State 1, Line 1
Table error: Object ID 1113471441, index ID 4, partition ID 72057686612705280, alloc unit ID 72057686504243200 (type In-row data). Page (1:28792043) was not seen in the scan although its parent (1:28646011) and previous (1:29261471) refer to it. Check any previous errors.

...

Msg 8978, Level 16, State 1, Line 1
Table error: Object ID 1113471441, index ID 4, partition ID 72057686612705280, alloc unit ID 72057686504243200 (type In-row data). Page (1:29277944) is missing a reference from previous page (1:28792043). Possible chain linkage problem.
Msg 8951, Level 16, State 1, Line 1
Table error: table '_AccumRg20041' (ID 1113471441). Data row does not have a matching index row in the index '_Accum20041_ByProperty20064_RTRN' (ID 5). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:2427358:19) identified by (_Period = '2012-01-31 23:59:59.000' and _RecorderTRef = 0x000001E9 and _RecorderRRef = 0x8AB318A905522C0811E14FB16253C15F and _LineNo = 1108890.) with index values '_Fld20059RRef = 0xAF8FD74FE72282924EBDCC337C30F511 and _Period = '2012-01-31 23:59:59.000' and _RecorderTRef = 0x000001E9 and _RecorderRRef = 0x8AB318A905522C0811E14FB16253C15F and _LineNo = 1108890.'.
There are 60595529 rows in 2634616 pages for object "_AccumRg20041".
CHECKDB found 0 allocation errors and 68 consistency errors in table '_AccumRg20041' (object ID 1113471441).
.....
CHECKDB found 0 allocation errors and 68 consistency errors in database 'Li#######enie'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Li#######enie).
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Найдено 68 ошибок, все в таблице _AccumRg20041 это РегистрНакопления.ЗатратыНаВыпускПродукции

Запускаем проверку с исправлением, разрешаем потерю данных, время выполнения 1 час 20 минут:


DBCC CHECKDB ('Li####enie', REPAIR_ALLOW_DATA_LOSS)


Все ошибки были исправлены.

Переводим базу в многопользовательский режим, после этого она заработала, но целостность данных гаранитровать сложно.


ALTER DATABASE Li####enie SET MULTI_USER


Время ремонта базы соствило не менее 2 часов и нет гарантии, что данные в регистре ЗатратыНаВыпускПродукции корректны. Восстановление последнего бакапа более надежный ход как мне кажется. Сейчас используем стратегию резервного копирования, которая позволяет быстро восстановиться с минимальными потерями данных:
1) в 7 утра снимается полный бакап (~200Гб)
2) каждый час снимается дифференциальный бэкап (к концу дня он занимает ~3Гб).

Имея утренний бэкап и последний дифференциальный мы сможем быстро откатиться на начало текущего часа в случае проблем. Если утренний бэкап сразу развернуть на сервере в NO RECOVERY, то время простоя может еще сократиться. Но данные последнего часа, введенные пользователями, будут потеряны.

Похожие материалы

Справочники

Документы

Отчеты

Обработки