How to Troubleshoot a Runaway Transaction Log File
Part of being a DBA is dealing with transaction logs that grow uncontrolled for some reason. This often gets raised to the DBA in the form of a drive space issue on the disk that contains the transaction log. Troubleshooting this can go a few different directions depending on the environment.
Is the database in full or simple recovery, are they using Availability Groups, Replication, or Change Data Capture? Each can have an impact on why the log is growing out of control.
A quick script to determine the recovery model and what is going on with the transaction log is below:
SELECT
name,
recovery_model_desc,
log_reuse_wait_desc,
log_reuse_wait,
is_cdc_enabled
FROM sys.databases
WHERE name = 'YourDatabaseName';
The log_reuse_wait_desc tells you exactly what is blocking truncation. This is the fastest way to diagnose the root cause. If it says ‘NOTHING’, the log is just actively growing due to heavy write activity and hasn’t hit a truncation point yet.
The official list of values for log_reuse_wait_desc is below. Most of these are self explanatory, however I’ll dig into the more common scenarios.
| Value | log_reuse_wait_desc | Meaning / When it Happens |
| 0 | NOTHING | Normal/good state. Reusable VLFs (Virtual Log Files) are available. Log can be truncated. |
| 1 | CHECKPOINT | No checkpoint has occurred since the last truncation, or the log head hasn’t moved past a VLF. Common and usually transient (especially in SIMPLE recovery). |
| 2 | LOG_BACKUP | Database is in FULL or BULK_LOGGED recovery model and needs a transaction log backup to truncate. Most common reason in FULL recovery. |
| 3 | ACTIVE_BACKUP_OR_RESTORE | A full/differential backup or restore operation is (or was) running. |
| 4 | ACTIVE_TRANSACTION | There is a long-running or open transaction (including deferred transactions). Also common in tempdb. |
| 5 | DATABASE_MIRRORING | Database mirroring is paused or the mirror is significantly behind (Full recovery only). |
| 6 | REPLICATION | Transactional replication has undelivered transactions to the distribution database (Full recovery only). |
| 7 | DATABASE_SNAPSHOT_CREATION | A database snapshot is being created. Usually brief. |
| 8 | LOG_SCAN | An active scan of the transaction log is occurring. Usually brief/transient. |
| 9 | AVAILABILITY_REPLICA | A secondary replica in an Always On Availability Group is still applying log records (Full recovery only). |
| 13 | OLDEST_PAGE | With indirect checkpoints enabled, the oldest dirty page in the buffer pool is delaying truncation. |
| 16 | XTP_CHECKPOINT | In-Memory OLTP (memory-optimized tables) needs a checkpoint (log grew > ~1.5 GB since last one). |
| 10-12, 14 | (Internal) | Reserved for internal use or transient conditions not currently exposed. |
Common scenarios
Recovery Model + Log Backups
Database is in FULL or BULK_LOGGED and the transaction log backups are not happening for some reason. A common situation is the backup disk is full and log backups are failing. Another is a database was put in the FULL recovery model and a log backup has never been taken. Check for when the last log backup was taken.
SELECT
database_name,
MAX(backup_finish_date) AS LastLogBackup
FROM msdb.dbo.backupset
WHERE type = 'L'
AND database_name = 'YourDatabaseName'
GROUP BY database_name;
Active / Long-Running / Uncommitted Transactions
Run the following to see if there are any open transactions for your database:
DBCC OPENTRAN('YourDatabaseName');
Or you can run the following to check for any active transactions on the instance
SELECT
at.transaction_begin_time,
DATEDIFF(MINUTE, at.transaction_begin_time, GETDATE()) AS MinutesOpen,
es.session_id,
es.host_name,
es.program_name,
es.login_name,
st.text AS MostRecentQuery
FROM sys.dm_tran_active_transactions at
INNER JOIN sys.dm_tran_session_transactions tst ON at.transaction_id = tst.transaction_id
INNER JOIN sys.dm_exec_sessions es ON tst.session_id = es.session_id
LEFT JOIN sys.dm_exec_connections ec ON es.session_id = ec.session_id
OUTER APPLY sys.dm_exec_sql_text(ec.most_recent_sql_handle) st
ORDER BY at.transaction_begin_time ASC;
Anything > 5-10 minutes is worth investigating, but anything over >30-60 minutes is very likely to be the cause of the runaway transaction log.
High-Availability / Replication Features
Common log_reuse_wait_desc values include:
AVAILABILITY_REPLICA – Always On Availability Group secondary is lagging, disconnected, or suspended
REPLICATION – Transactional replication, CDC (Change Data Capture), or log-shipping. If using CDC, ensure the jobs are running and the latency is low.
DATABASE_MIRRORING – Issue with mirroring lagging behind
Conculsion
Knowing the various scenarios and being able to quickly triage and remediate the full transaction log is key to being able to restore customer functionality. Things happen, and this is a very common situation. Monitoring for some of these conditions can be put into place, however in many cases, the rapid growth of the transaction log may happen before the alert can trigger support to take action before an outage occurs.
Leave a Reply