Update 1: Added Exchange & Virtualization + NFS Support statement. Split of DAG failover / 20 sec timeout tips.
Update 2: At 1.4. blogger completely destroyed my article as a April-Joke, so I had to rearrange the text again. I added some small updates to the text based on your feedback. Thanks for sending me change ideas.
Update 3: 11.April 2014 Added some planning informations round about CPU/RAM/DISK to section 2. and 3.
Update 4: 24. April 2014 Added Exchange Update Link with Exchange 2013 CU3 example.
Here are my updated general recommendations for Exchange/Exchange DAG at VMware together with Veeam Backup & Replication.
1. Check Microsoft Exchange 2013 Virtualization Topic:
- Exchange 2013 DAG are supported on virtualization platforms.
- Microsoft say: Snapshots are not supported, because they are not application aware and can have unintended and unexpected consequences.
Because Microsoft do not support Snapshots at Exchange themselves with, you need to contact your backup or virtualization vendor for any snapshot related questions/support requests. So why using a Snapshot based Backup instead of a Agent based backup method? You can easily restore servers in a minimum time (Instant VM Recovery can bring back the Server in a minute + boot time), Backup time window can be achieved with virtualization backup more easily (so you can perform backups more often on exchange full level), in case of Veeam it will support any new Patch/Service Release by design without an Veeam own Patch... and with Veeams "Veeam Explorer for Exchange" you can restore Exchange objects very fast and uncomplicated. Compared to many others Exchange object restore backup software, Veeam is in many cases very affordable.
2. NFS Datastores are not supported
Please check out this article, it will help to understand why NFS Datastores for Exchange loads are not an good idea. (Teasers: Exchange Event ID 1018; NFS can abort transactions only on a best effort base and can cause corruptions in the database.)
However Microsoft added neccessary functionallity to SMB3.0, so in case you run Exchange on Hyper-V Win2012/2012R2 you can use SMB3.0 shared storage on Windows File Server 2012/2012R2. You have to place Exchange data into VHD/VHDX then.
3. VMware+Exchange Design + Background informations:
Please check this link and the whitepapers linked on the bottom. The whitepapers containing outstanding background informations for your exchange platform design.
There are also 2 interessting Webinars from VMware
For VMware and Hyper-V
- Do not go higher than 2-to-1 in Virtual CPU to Physical Core ratio. Microsoft strongly recommend a ratio of 1-to-1 on the host where the Exchange Server runs.
- Use thick provisioning for the disks for better performance but use thin provisioning for backup optimization.
- Give the OS disk enough spare space and place it on a fast storage system as well (do not place it on a datastore with hundreds of other VM boot volumes.
- Do not use Dynamic Memory
- Reserve 2 physical Cores for the Host OS
- If you perform on Host Backups reserve Ressources for this as well (RAM/CPU)
- Use VHDX whenever it is possible (max 64TB + improved sector alignment + more corruption resistent on power failures)
- Don´t forget to plan the space for the *.bin file (equal to memory size)
Check if you have the latest Exchange Updates installed. For Example:
Cumulative Update 3 for Exchange Server 2013
to address random backup problem with Event ID 2112 and 2180
4. More Veeam specific Exchange Tips and Tricks:
In general, if you use virtualization based backups for Exchange, there is a chance to hit one of 2 problems:
- Exchange DAG cluster failovers
- Exchange VSS timeouts (Exchange hard coded 20 second timeout between start of consistence porcessing and release.
Also if you design your Exchange environment, the next tips can help to prevent these problems as well.
Exchange DAG uses a network based heartbeat between the DAG members. When VMware Snapshots are committed, the delta changes are written back to the original storage space in small chunks. This is done by holding the VM (for a time the application cannot detect by default - storage latency) writing the chunk with part of the snapshot data back to the original storage space and release the VM. This will be repeated till all data is written. Aggressive DAG Cluster heartbeat can detect one or more of that VM holds as a network outtake and may perform an DAG cluster failover. This is transparent to the users in most cases. Multiple cluster failover/failbacks bringing extra load to the Exchange background services (e.g. indexer) and therefore you should prevent that situation.
To address this, check out the following tips and tricks:
Tips for preventing Exchange DAG cluster failovers:
In general you need to prevent a big snapshot file by reducing disk block changes at backup time window and hold the snapshot lifetime at minimum possible, because less time means less changes in the snapshot file and finally less problems at snapshot commit.
Increase the DAG heartbeat time to avoid cluster failover (no reboot or service restart needed, they are online after you press enter).
On a command line (with admin rights)
cluster /prop SameSubnetDelay=2000:DWORD
cluster /prop CrossSubnetDelay=4000:DWORD
cluster /prop CrossSubnetThreshold=10:DWORD
cluster /prop SameSubnetThreshold=10:DWORD
You can check the settings with:
After setting this Registry Key, perform an manual Snapshot at VMware Client and release it after 3 seconds.
If a DAG failover is performed, please check tips below and work with VMware Support till this works without failover. After that you can work on the Veeam side by optimizing backup infrastructure.
Use new Veeam Storage Snapshot Feature (StoreVirtual/3PAR/VSA) if you can (after v7 release) => Reduces Snapshot Lifetime to some seconds => No load and problems at commit because of less data. (This option can be counterproductive if you experience the 20 sec VSS timeout)
To reduce Snapshot commit time (and to reduce data in the snapshot), try to avoid any changes at the backup time window (User, Background processes, Antivirus, ....). Also try to avoid that on all LUNs on the storage System itself (faster writes at snapshot commit).
If you cannot avoid many changes on block level at your backup window? Use Forward Incremental or if you need space Forward Incremental with daily transform into rollbacks. Reverse Incremental took a bit longer than the other backup methods => longer snapshot lifetime => more changes in the Snapshot to commit
To reduce Snapshot lifetime and reduce amount of data to snapshot commit, use new Veeam parallel processing with enough resources to backup all of your disks at the same time (after v7 release)
To reduce backup time window and snapshot lifetime, use Direct SAN Mode with minimal needed disks connected at selected Proxy. If not possible use NBD mode with 10GbE. (Do not run Proxy in auto select mode). Disable VDDK Logging for Direct SAN Mode if your backups themselves run stable (ask support for the registry key and consequences).
Use actual VMware Versions (newest VADP/VDDK Kits with a lot of updates in it) and actual Veeam Versions (newer VDDK Integration). And install actual ESXi/vCenter patches!
Use at minimum VMware vSphere 5.0 because of changes in the snapshot places and Background things.
Still problems: Use faster disks for all of the VM disks (do not forget to place the OS disks on fast storage !!!)
Less VM VMware disks can help to reduce snapshot commit time. Increase of maximum parallel snapshot commits setting can help to reduce the needed snapshot commit time (parallel vs. sequential). Check with Veeam support the registry settings and needed storage performance for this.
In a worst case scenario and no other tips help, you can check the following VM setting. This an undocumented VM setting and you have to check with VMware the support statement. This was a tip from one of my customers with 13TB+ Exchange environment, who had a long run with VMware Support.
snapshot.maxConsolidateTime = "1" (in seconds) (again do this only together with VMware support).
If you have problems with cluster failover at Backup, one option is to backup DAG member(s) that hold only inactive databases (no cluster failover because of no active databases) (Logfile Truncation will be replicated by Exchange in whole DAG). This give you also the option to restart the server or services and Exchange process VSS consistency more faster afterwards. If you restart the services, take care that you wait long enough afterwards that also the VSS Exchange writers come up again, before you backup.
If you add an additional DAG member server for this, you have to check with you Exchange Architect the situation, because you change the member count for quorum failover selection. (e.g. you have 2 Exchange DAG members on different datacentres and a whiteness disk on datacentre 3 and you add an additional Exchange Server on one side, the failover is affected because of different server count on the one datacentre.)
Tips for preventing VSS (timeout) problems:
If you perform an VSS based consistency on an exchange server, hard coded 20 second timeout release the Exchange VSS writer state automatically if the consistency state are hold longer than these 20 seconds. The result is that you cannot perform consistent backups.
In detail, you have to perform Exchange VSS Writer consistency, VM snapshot and Exchange VSS writer release in these from Microsoft hard coded 20 seconds
If you see Exchange VSS Timeout EventLog 1296 => Change Log setting =>
Set-StorageGroup -Identity "<yourstoragegroup>" -CircularLoggingEnabled $false
In many cases Exchange can perform consistency more faster if you add more CPU/Memory to the VM. Based on customer feedback this solved many of the VSS timeout problems.
Use faster disks for all of the VM disks (do not forget to place the OS disks on fast storage as well !!!)
The worst thing you can do is to place the OS disk on a datastore with hundreds of other boot vmdk volumes on a Raid5/Raid6 storage.
Use at minimum VMware vSphere 5.0 because of changes in the snapshot creation area.
Use actual VMware Versions (newest VADP/VDDK Kits with a lot of updates in it) and actual Veeam Versions (newer VDDK Integration). And install actual ESXi/vCenter patches! => Perform Snapshots more faster
Important one on VMware side: Less VM disks will reduce snapshot creation time. Check how long it take to start a snapshot of the VM in VMware vSphere (web) client. Think about that you need to perform Exchange VSS writer consistency + VM snapshot in 20 seconds.
To optimize snapshot creation time:
Check your vcenter load and optimize it (or use direct ESX(i) Connections for Veeam VM selection, so that the snapshot creation took less time.)
Check your health an configuration of Exchange itself. I saw some installations where different problems ended up with a high cpu utilization at indexing service. This prevented VSS to work correct. Check also all other mail transport-cache settings. Sometimes the Transport Service cache replicate shadows of the mails over and over again and nobody commit them (if you have multiple transport services together with firewalls between them).
Veeam specific: Veeam performs VSS processing over the network. Check with Veeam UserGuide TCP Port Matrix that B&R Server can perform Veeam Guest Processing over the network (open Firewall Ports).
If this is not possible Veeam failback (after a timeout) to networkless VMware Tools VIX communication channel (Veeam own) In-Guest processing. If you use networkless In-Guest processing, change the veeam registry key, so that VIX based processing is performed before network In Guest processing. => No wasted time because of waiting for timeout.
However network based In Guest processing is performed faster and I recommend it.
Use the Veeam Forums http://forums.veeam.com and search for specific Exchange Topics, there you can find additional tips and feedback. Keep in mind that Veeam Forum is not a official support forum. If you need urgent help, please open a support ticket http://www.veeam.com/support . Testing and Proof of concept environments have support with lower priority.
Do you have feedback?
Was one of the tips helpful?
Please send me feedback.
All the best to you and success... Andy