Why Choose Always On Availability Groups?
Always On Availability Groups (AG) are Microsoft’s native high‑availability and disaster‑recovery (HADR) solution for SQL Server. They provide:
- Automatic failover with synchronous‑commit replicas.
- Read‑only routing for reporting workloads.
- Granular failover at the database level, not the instance level.
- Support for up to eight secondary replicas (SQL Server 2019+).
For DBAs accustomed to Oracle Data Guard or PostgreSQL streaming replication, AGs feel familiar yet bring unique Windows‑centric capabilities such as Windows Server Failover Clustering (WSFC) and integrated listener support.
Prerequisites Overview
1. SQL Server Edition & Version
Enterprise edition is required for full AG functionality (basic AGs are available in Standard edition from SQL Server 2016 SP1 onward). Ensure all nodes run the same major version and build.
2. Windows Server Failover Clustering (WSFC)
AGs rely on a WSFC for quorum, node health detection, and resource management. Install the Failover-Clustering
feature on each node and configure a cluster with a dedicated IP address.
3. Network & Storage Requirements
- Separate subnets for data replication (private) and client access (public) are recommended.
- Use high‑throughput, low‑latency links (10 GbE or better).
- Do not place database files on shared storage; each replica must have local storage (e.g., SAN LUNs or direct‑attached NVMe).
4. Domain & Permissions
All nodes must be members of the same Active Directory domain. The SQL service account requires Log on as a service
and Log on as a batch job
rights, plus permission to create and manage the AG listener DNS record.
Step‑by‑Step Configuration
2.1 Create the WSFC Cluster
# PowerShell – create a new cluster
Import-Module FailoverClusters
New-Cluster -Name SQLAGCluster -Node Node01,Node02 -StaticAddress 10.0.0.100 -NoStorage
2.2 Enable Always On Feature on Each Instance
Open SQL Server Configuration Manager → SQL Server Services → Right‑click the instance → Properties → “Always On High Availability” → Check “Enable Always On”. Restart the service.
2.3 Prepare Databases for Replication
- Set the recovery model to
FULL
. - Take a full backup and a subsequent transaction‑log backup.
- Restore the backup on each secondary with
NORECOVERY
.
-- Primary
ALTER DATABASE MyDB SET RECOVERY FULL;
BACKUP DATABASE MyDB TO DISK='\\backup\MyDB_Full.bak';
BACKUP LOG MyDB TO DISK='\\backup\MyDB_Log.trn';
— SecondaryRESTORE DATABASE MyDB FROM DISK=’\\backup\MyDB_Full.bak’ WITH NORECOVERY;
RESTORE LOG MyDB FROM DISK=’\\backup\MyDB_Log.trn’ WITH NORECOVERY;
2.4 Create the Availability Group
Use SQL Server Management Studio (SSMS) or T‑SQL. The example below creates an AG named AG_Sales
with one synchronous‑commit secondary.
CREATE AVAILABILITY GROUP AG_Sales
FOR DATABASE MyDB
REPLICA ON
N'Node01' WITH (
ENDPOINT_URL = N'tcp://Node01.domain.com:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = AUTOMATIC,
BACKUP_PRIORITY = 1,
PRIMARY_ROLE (ALLOW_CONNECTIONS = ALL),
SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
),
N'Node02' WITH (
ENDPOINT_URL = N'tcp://Node02.domain.com:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = AUTOMATIC,
BACKUP_PRIORITY = 2,
PRIMARY_ROLE (ALLOW_CONNECTIONS = ALL),
SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)
);
2.5 Add the Listener
The listener is a virtual network name that clients use to connect. It abstracts the underlying replica IP.
ALTER AVAILABILITY GROUP AG_Sales
ADD LISTENER N'AG_Sales_Listener' (
WITH IP (('10.0.0.150', 1433))
);
2.6 Join Secondary Replicas
After the primary AG is created, join each secondary:
ALTER AVAILABILITY GROUP AG_Sales JOIN;
2.7 Verify the Configuration
SELECT ag.name, ar.replica_server_name, ar.role_desc, ar.availability_mode_desc,
ar.failover_mode_desc, dr.synchronization_state_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_database_replica_states dr ON ar.replica_id = dr.replica_id;
Best Practices for Production Deployments
- Quorum Configuration: Use a Node and Disk Majority model, or add a File Share Witness on a separate server to avoid split‑brain scenarios.
- Endpoint Security: Secure the AG endpoint with a certificate and enforce encryption (TLS).
- Backup Strategy: Perform backups on the primary replica by default; however, you can offload backups to a secondary to reduce I/O pressure. This mirrors the concept of RMAN backup automation in Oracle.
- Performance Tuning: Monitor
log_send_rate
,redo_queue_size
, andreplica_latency
via DMVs such assys.dm_hadr_database_replica_states
. Adjustmax_transmission_size
and network QoS if needed. - Read‑Only Routing: Define routing lists to balance reporting queries across secondary replicas.
- Maintenance Mode: Use
ALTER AVAILABILITY GROUP … SET (AUTOMATIC_SEEDING = OFF)
before applying patches, then re‑seed as needed.
Failover Testing
Regularly test both planned and unplanned failovers to ensure SLA compliance.
Planned Manual Failover
ALTER AVAILABILITY GROUP AG_Sales FAILOVER;
Observe the role transition in the SSMS dashboard or via the DMV query shown earlier.
Simulating Unplanned Failover
Stop the SQL service on the primary node or disconnect its network adapter. The WSFC will detect the failure and promote the secondary automatically (if configured for automatic failover).
Monitoring & Alerting
Integrate AG health checks with your existing monitoring stack (e.g., System Center Operations Manager, Prometheus exporters, or third‑party tools). Key metrics include:
- Replica health (online/offline)
- Data synchronization state
- Listener connection counts
- Redo latency (ms)
Comparative Note: Oracle DBA & PostgreSQL
While Oracle Data Guard uses redo transport and apply, AGs rely on log send/redo apply via the endpoint. PostgreSQL’s streaming replication is analogous but lacks automatic listener routing and integrated read‑only load balancing out‑of‑the‑box. Understanding these parallels helps multi‑platform DBAs design consistent HADR policies across heterogeneous environments.
Common Pitfalls and How to Avoid Them
- Mismatched SQL Server builds: Even minor patch differences prevent AG creation. Use
SELECT @@VERSION
to verify. - Incorrect quorum: A two‑node cluster without a witness can lose quorum. Always add an odd number of voting members.
- Endpoint port conflicts: Default is 5022; ensure the port is open and not used by other services.
- Backup on secondary without proper permissions: The SQL service account on the secondary must have write access to the backup location.
- Network latency > 2 seconds: Synchronous commit may degrade performance; consider asynchronous mode for geo‑distributed replicas.
Automation Opportunities
Leverage PowerShell DSC, Azure Automation, or Terraform to provision clusters, endpoints, and AG objects. Below is a minimal PowerShell snippet to add a new database to an existing AG:
$agName = 'AG_Sales'
$dbName = 'Orders'
Add-SqlAvailabilityDatabase -Path 'SQLSERVER:\SQL\Node01\DEFAULT' `
-AvailabilityGroup $agName -Database $dbName
Conclusion
SQL Server Always On Availability Groups provide a robust, flexible platform for high availability and disaster recovery. By following the prerequisites, configuring WSFC correctly, and adhering to best‑practice guidelines, DBAs can achieve near‑zero RPO/RTO, offload reporting workloads, and maintain a consistent backup strategy comparable to Oracle’s RMAN automation or PostgreSQL’s streaming replication.
Stay ahead of the curve—subscribe to our newsletter for the latest DBA insights, and follow Person‑IT on LinkedIn for weekly tips on performance tuning, backup automation, and multi‑platform database management.