True that a server is a server, and as I always say, all manufacturers are working within the same boundaries dictated by the chipsets. HP and Cisco for example are surely leveraging the same technologies, led by processors and RAM.
So how do you choose the right hardware?
There is far more to consider than just the socket and the ram, especially in a software defined datacenter, the multiple technologies alignments are far more driving the decisions than simply the socket. It is a no brainer that today’s workloads are generally virtualized, therefore, it exist in your datacenter today, a mix of workloads that require a personalized attention.
VMware has been significantly enhancing its technology, ESXi, to support the most demanding workload, including databases such as SAP. In a world of mixed workloads, (hoping you’re there already, and if you’re not this would surely help you), choosing the right technologies to deploy in your datacenter becomes an art, where we all need to be artists and creators of predictable outcomes for our lines of businesses.
Here’s a few thing, among many others, that you should consider
First, Consider NUMA node size (ref http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html), as memory today runs slower than CPU cycles, providing access to local (or not) memory for faster processing of data is indeed a huge advantage in complex (or not) workloads. The importance of VM sizing for NUMA is huge when dealing with sensitive workload; the performance hit in the event of wrong sizing is an expensive outcome to pay. High %RDY is a metric you should monitor or have an eye on.
For example, if the hardware configuration is a 48 core system (4 socket, 12 core physical CPU) that has 6 physical CPU per NUMA node, an 8 vCPU virtual machine is split into 4 vCPU clients that are scheduled on two different nodes. The problem is that 4 vCPU clients are treated as a granular unit for NUMA management. When 4–8 vCPU virtual machines are powered on, all the 8 NUMA nodes have 4 vCPUs each. When the fifth virtual machine is powered on, its 2 clients get scheduled on 2 NUMA nodes which already have 4 vCPU clients running on them. These 2 NUMA nodes therefore have 8 vCPUs each (on 6 cores); causing those nodes to be overloaded and a high ready time to be experienced.
You also need to look for CPU scheduling for larger VMs. The CPU scheduler is an crucial component of vSphere. In a traditional architecture, all workloads running in a physical machines are calculated through the operating system and passing the request to, say, Windows.
In a virtual environment, let’s not forget that the virtual machine, say a Windows server, is seen as a “workload”, and that “workload” has its own “workload”, the Windows services for example continuously running to enable Windows with its core components and features. While in a physical environment, the “workload” associated with the Windows services is directly handled by the hosting operating system, say Windows, in a virtual environment, the hypervisor deals with ALL workloads, coming from “within” the virtual machine (i.e the Windows services), along with the virtual machine itself.
Virtual machine must be scheduled for execution and the CPU scheduler handles this task with policies that maintain fairness, throughput, responsiveness, and scalability of CPU resources.
I have retained from my VCAP training that one of the major design goals of the CPU scheduler is fairness. That’s an easy way to remember CPU Scheduling.
Make sure as well that there is a balance between CPU & RAM to avoid underutilization of resources.
I keep hammering this point, but I do believe that virtual machines density is a tactical aspect of a day-to-day operation too often forgotten. It is crucial to keep a balanced cluster in a vmware farm and a high level density per host and for that, an important aspect of an operational activity is to ensure the investments done in hardware acquisitions are efficiently leveraged (i.e getting more VM workload on a single host, increase the ROI on that physical server. Simple math)
It’s difficult to say how many VMs you should have on a single; the main reason is simple: not all workloads are created equally. Investing in a virtual centric monitoring tool is a smart investment.
vSOM does a good job at that. I have spent many years of my career as a monitoring expert, implementing HPOV, MOM, OpsMgr and SCOM, SolarWinds….etc to help organization been proactive in downtimes, and forecast performance challenges to establish a stable environment where all workload evolve in an optimized manner.
When you try to forecast the future, like tactical monitoring is supposed to do, you need to rely on tons of metrics and you need to analyze them in a way that represents your business, interpret them so the right actions can either be planned, or immediately performed.
Finally ensure sufficient I/O & network throughput is available; that’s a no brainer I think. If you do a good job at increasing the workload on a single host, the ripple effects are higher throughput from that host at the network and storage level
VMware’s best practices are providing a good overview of what is needed in terms of port requirements for the various workloads. Management, vMotion, Storage vMotion, Ethernet based traffic and storage traffic are personalized to your environment but there’s fundamentals metrics to be respected. I strongly suggest to not deviate to far from the vendor’s expectations. For example, don’t mix “management traffic” and “VM ethernet” traffic. Seems obvious but I’ve seen it.
With the amount of ports needed, and the amount of cables required, think outside the box and look at 10Gbps ethernet card. But more precisely look at technologies that are allowing you to create virtual ports, vPorts. The Qlogic and VICs are a good start.
On the storage side, the same applies. Amount of ports Vs amount of cables required to carry the traffic is something you’ll need to consider. Creating an environment, ethernet based, that allows a seamless flow of data traffic is a crucial element to a performant environment. When you do all the work I have just described above, don’t let the storage traffic kill your efforts.
The storage traffic is as important as the ethernet traffic. Having 30-40 VM on a single host, will generate a hight traffic on your iScsi pipes; keep an eye on vmware’s metric %RDY and %WAIT to evaluate how your VM is performing (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008205). Understanding, from a VM point of view, the storage performances can be challenging, granted, so use https://communities.vmware.com/docs/DOC-9279 to help you keep an eye on what matters the most.
Choosing the right hardware is not an easy task, especially when you consider the environment it requires to compute. Far more than a simple box, the platform is a crucial element that aggregates a series of components resulting in an optimized farm.
Don’t underestimate the workload and the density; they are the key to ROI from a business standpoint, but are driving an optimized OPEX that many CFO are sensitive to.
As always, feel free to send your feedback. I always enjoy reading your point of views, and enhance my personal knowledge with the experience of all.
Wishing you a good weekend my friends.