High Availability and Disaster Recovery Options for Dynamics 365
High Availability – Dynamics 365
For most companies, having high availability for their Dynamics 365 deployment is a must and the setup for this is similar to that of other web-based applications. Keep in mind that through this section we are strictly referring to Dynamcis 365 servers with the “Front End” or “Full” role – in other words, the servers that host the website in IIS. Having multiple “Back End”-only servers (those with the Async and Sandbox services) is also a good idea but those will natively load balance themselves if they are in the same deployment.
Network Load Balancing – Probably the most obvious method for high availability is running multiple Dynamics 365 servers in a load-balancing configuration. This will not only help with carrying the day-to-day load of traffic but also serve as the foundation for a good high availability option. If a server was to fail, just pop it out of the NLB pool. Load balancing comes in a few different flavors because of all the applications out there but mainly it comes down to two options – hardware and software.
Hardware Load Balancing involves the use of a third-party appliance such as F5 or NetScaler that sits in front of the Dynamics 365 servers to handle which server receives the user requests. While this is the preferred option as it does not create any additional processes for the servers to handle and usually provides a wider array of configuration options (e.g. monitoring, security, filtering, etc…), it can be more difficult and much more costly to implement.
Software Load Balancing on the other hand is relatively easy to setup and in the case of Windows NLB, is free, as you have already paid for your Windows Server licenses. Critics of software load balancing will contend that it is not “true” high-availability due to the lack of many key features found in hardware load balancing. The fact stands that Windows NLB is just Round Robin DNS so if a server were to go down, requests would still be sent to the faulted server until it is told otherwise.
As one can probably infer, this is simply a cost vs. ROI decision – which is the case with most IT situations. If the hardware option is being considered, make sure to reach out to your hardwarevendor to ensure there are no known issues with it and Dynamics 365. It is also a good idea to inquire about documentation for setup. For more information about load balancing Dynamics 365 in general and the required configurations, check out this TechNet: https://technet.microsoft.com/en-us/library/jj822357.aspx
Backup Server(s) – While this method may be a little more rigid, it is still effective and again, has a couple options. The best way of going this route is having a server to fall back to in the event that your primary Dynamics 365 server(s) fails and NLB has not been implemented.
VM Snapshots – If you are reading this, you most likely understand the concept of VM snapshots so I will not go on too heavily into this. The idea is to have automatic snapshots taken automatically on a set interval that the Dynamics 365 server can be reverted to or new server built from in the event of failure. Reverting the existing server is the easier option but also the more risky of the two. Spinning up a new server is safer but more involved – and requires the shutdown of the existing server as you will need to use the same host name and IP address.
Laying in wait – I have had some colleagues opt for this route as the return to uptime is a little quicker and/or VMs were not being used. Dynamics 365 deployments can obviously be comprised of multiple servers – but not all need to be active or enabled. The concept here is to build a secondary Dynamics 365 server as part of the deployment but leave it disabled until it is needed. If the primary server were to fail, all that would be required is enabling of the server via the Deployment Manager and changes to DNS (and firewall rules, if applicable).
High Availability – SQL Server
Now that we have covered high availability from an application perspective, what about the database? There is Good News! Microsoft has that covered this aswell!
SQL Server Failover Cluster – There is nothing new about this concept. Two SQL Servers are setup to use a shared disk, one server goes down and the other picks up the load. When installing Dynamics 365 just make sure to point connection strings to the cluster name and not a single node and you should not have any issues but be sure to test failover prior to going live. Just to note - although you can install Dynamics 365 to a SQL Server cluster configured for either active-active or active-passive clustering, the cluster will function in an active-passive manner. While clustering is tried and true, Microsoft decided to go a step further by introducing Always On.
SQL Server Always On – New to SQL Server 2012 (Enterprise Edition), Always On introduced the concept of Availability Groups. An Availability Group is simply a container for a set of databases that fail over together. For purposes of Dynamics 365 databases, this is important considering there are always at least two databases per deployment. With Always On, the databases in the Availability Groups are synchronized amongst all the configured replicas. I won’t get into the details of setting up Always On (there are lots of articles out there already that go into detail on this) but be sure to reference this TechNet for how to configure Dynamics 365 with SQL Always On: https://technet.microsoft.com/en-us/library/jj822357.aspx
Disaster Recovery – Dynamics 365 and SQL Server
While having a single server fail is far more likely than seeing your entire data center swallowed into a pit only to be burnt up in the Earth’s core, it is a situation that should also be planned for! Unlike the sections regarding high availability, we need to think of a DR scenario in terms of the entire infrastructure – both Dynamics 365 and SQL Server – because if there is a disaster, chances are it is taking both so these plans go hand-in-hand.
Over the years, we have implemented CRM/Dynamics 365 disaster recovery plans for numerous customers and as a result, have learned (what we consider) the best option – which is what l will be covering. This is not to say there are no other viable methods/solutions (such as VM snapshot migrations or SQL Log Shipping), but I will keep focus on my preferred option. In a nutshell, I am going to have a separate, self-contained Dynamics 365 environment in the DR data center that has data synced from the production SQL servers in the primary site.
Let me start from a database perspective since it is a little more complex and setting up Dynamics 365 for DR will require this information. As previously touched on, SQL Always On will come in handy yet again and recall the concept of Availability Groups as this will play heavy into this strategy.
For DR scenarios with Dynamics 365, you will need two Availability Groups in SQL AlwaysOn – one for the MSCRM_CONFIG database to be synchronized between the two primary data center nodes and another to synchronize the Organization_MSCRM database between not only the two primary data center SQL Servers but also a third SQL Server in your DR data center. The reason for this will become clear as you keep reading.
Going into the Dynamics 365 server installation(s), the third SQL node should already be built and ready with SQL Always On. Unlike the installation of the Dynamics 365 servers in the primary data center, you do NOT want to set the SQL connection to use the Availability Group but rather just the SQL Server located in the DR data center. The reasoning/thinking behind this is the MSCRM_CONFIG database – this database is specific per deployment of Dynamics 365 and two cannot exist in the same SQL server/instance (nor can the name be changed). Remember that this Dynamics 365 DR deployment is to be isolated from the primary.
At this point, you may be asking yourself how this all comes together. When the primary data center goes down (mainly both primary SQL nodes), the organization database will be failed over to the third node in DR. Once that happens, launch the deployment manager on the Dynamics 365 DR Server and import the organization database into the deployment. That should take about 5 minutes and then Dynamics 365 will be back up and running.
Awesome! You now have the servers all setup for DR and you are ready for that fire to ignite at the data center! Now whats next? RUN THROUGH A TEST DISASTER RECOVERY SCENARIO! I cannot be stress enough – you do not want to wait for a real disaster only to find some small configuration issue messed up the entire plan.