How to Troubleshoot a Runaway Transaction Log File

Part of being a DBA is dealing with transaction logs that grow uncontrolled for some reason. This often gets raised to the DBA in the form of a drive space issue on the disk that contains the transaction log. Troubleshooting this can go a few different directions depending on the environment.

Is the database in full or simple recovery, are they using Availability Groups, Replication, or Change Data Capture? Each can have an impact on why the log is growing out of control.

A quick script to determine the recovery model and what is going on with the transaction log is below:

SELECT 
    name,
    recovery_model_desc,
    log_reuse_wait_desc,
    log_reuse_wait, 
	is_cdc_enabled
FROM sys.databases 
WHERE name = 'YourDatabaseName';

The log_reuse_wait_desc tells you exactly what is blocking truncation. This is the fastest way to diagnose the root cause. If it says ‘NOTHING’, the log is just actively growing due to heavy write activity and hasn’t hit a truncation point yet.

The official list of values for log_reuse_wait_desc is below. Most of these are self explanatory, however I’ll dig into the more common scenarios.

Valuelog_reuse_wait_descMeaning / When it Happens
0NOTHINGNormal/good state. Reusable VLFs (Virtual Log Files) are available. Log can be truncated.
1CHECKPOINTNo checkpoint has occurred since the last truncation, or the log head hasn’t moved past a VLF. Common and usually transient (especially in SIMPLE recovery).
2LOG_BACKUPDatabase is in FULL or BULK_LOGGED recovery model and needs a transaction log backup to truncate. Most common reason in FULL recovery.
3ACTIVE_BACKUP_OR_RESTOREA full/differential backup or restore operation is (or was) running.
4ACTIVE_TRANSACTIONThere is a long-running or open transaction (including deferred transactions). Also common in tempdb.
5DATABASE_MIRRORINGDatabase mirroring is paused or the mirror is significantly behind (Full recovery only).
6REPLICATIONTransactional replication has undelivered transactions to the distribution database (Full recovery only).
7DATABASE_SNAPSHOT_CREATIONA database snapshot is being created. Usually brief.
8LOG_SCANAn active scan of the transaction log is occurring. Usually brief/transient.
9AVAILABILITY_REPLICAA secondary replica in an Always On Availability Group is still applying log records (Full recovery only).
13OLDEST_PAGEWith indirect checkpoints enabled, the oldest dirty page in the buffer pool is delaying truncation.
16XTP_CHECKPOINTIn-Memory OLTP (memory-optimized tables) needs a checkpoint (log grew > ~1.5 GB since last one).
10-12, 14(Internal)Reserved for internal use or transient conditions not currently exposed.

Common scenarios

Recovery Model + Log Backups

Database is in FULL or BULK_LOGGED and the transaction log backups are not happening for some reason. A common situation is the backup disk is full and log backups are failing. Another is a database was put in the FULL recovery model and a log backup has never been taken. Check for when the last log backup was taken.

SELECT 
    database_name,
    MAX(backup_finish_date) AS LastLogBackup
FROM msdb.dbo.backupset
WHERE type = 'L' 
  AND database_name = 'YourDatabaseName'
GROUP BY database_name;

Active / Long-Running / Uncommitted Transactions

Run the following to see if there are any open transactions for your database:

DBCC OPENTRAN('YourDatabaseName');

Or you can run the following to check for any active transactions on the instance

SELECT 
    at.transaction_begin_time,
     DATEDIFF(MINUTE, at.transaction_begin_time, GETDATE()) AS MinutesOpen,
    es.session_id,
    es.host_name,
    es.program_name,
    es.login_name,
    st.text AS MostRecentQuery
FROM sys.dm_tran_active_transactions at
INNER JOIN sys.dm_tran_session_transactions tst ON at.transaction_id = tst.transaction_id
INNER JOIN sys.dm_exec_sessions es ON tst.session_id = es.session_id
LEFT JOIN sys.dm_exec_connections ec ON es.session_id = ec.session_id
OUTER APPLY sys.dm_exec_sql_text(ec.most_recent_sql_handle) st
ORDER BY at.transaction_begin_time ASC;

Anything > 5-10 minutes is worth investigating, but anything over >30-60 minutes is very likely to be the cause of the runaway transaction log.

High-Availability / Replication Features

Common log_reuse_wait_desc values include:

AVAILABILITY_REPLICA – Always On Availability Group secondary is lagging, disconnected, or suspended

REPLICATION – Transactional replication, CDC (Change Data Capture), or log-shipping. If using CDC, ensure the jobs are running and the latency is low.

DATABASE_MIRRORING – Issue with mirroring lagging behind

Conculsion

Knowing the various scenarios and being able to quickly triage and remediate the full transaction log is key to being able to restore customer functionality. Things happen, and this is a very common situation. Monitoring for some of these conditions can be put into place, however in many cases, the rapid growth of the transaction log may happen before the alert can trigger support to take action before an outage occurs.

Leave a Reply

Your email address will not be published. Required fields are marked *