What to get ready in your vSphere Cluster design – PART 1

The hardware abstraction tsunami we have witness in the last 5 years is surely far from getting quiet. The growth of infrastructures has far surpassed expectations and the best is yet to come; yes you might be aware of it… you know… the cloud thing…

It is in the plans, believe me on that one, but it heavily relies on what was done in the past and ensuring the infrastructure is rock solid should not be an optional exercise. In my last post (https://florenttastet.wordpress.com/2014/06/29/choosing-hardware-for-use-as-an-esxi-host/) we’ve talk about host design and a few things we should be considering.

Today we’ll review the cluster design, some tricks about it and hopefully it would be a good revision of what was done. It all comes from personal experience along with vExperts inputs and a few folks living the dream day after day.

This is a series of 3 blogs, of a 1000 word or less, your feedbacks are always welcome as they serve our common knowledge and enhance our common expertise.

PART 1 of this post is focussed on Ressource Pools and their hierarchy, DRS and EVC.

PART 2 will cover SRM over vMSC, AutoDeploy, Restart Priority and Management Network.

PART 3 will review Permanent Device Loss Prevention, HA HeartBeat, DPM & WoL and vMotion bandwidth. 

Resource Pools and Resource Reservations

In plain words, always use resource pools to define reservations.

What are ressource pools? 

Resource pools allow you to delegate control over resources of a host (or a cluster), but the benefits are evident when you use resource pools to compartmentalize all resources in a cluster. 

All VM workloads should not be treated equally; a file server does not require the same level of attention from the hypervisor than a Tier1 app such as SQL or Exchange for example. When you create a resource pool you define the level of service expected and better manage the overall resource availability overall in the cluster and specifically for a set of VMs.

In a small or large environment it is a best practice that should be part of any design. As explosions and growth of business are difficult to forecast, putting in place this tactical design will pay off on the long run.

DRS rules to avoid single points of failure

What is DRS?

I personally love DRS. Distributed Resource Scheduler (DRS) is a feature included in the vSphere Enterprise and Enterprise Plus editions. Using DRS, improves service levels by guaranteeing appropriate resources to virtual machines, deploy new capacity to a cluster without service disruption and automatically migrate virtual machines during maintenance without service disruption.

Should you have at any given time a front end and back end of the same host? I say no! When deploying business critical highly available applications make sure to create DRS rules to make sure the two highly available virtual machines do not run on the same host.

DRS itself has an extraordinary benefit to a day to day infrastructure management, but leveraging all features within drives a priceless peace of mind. Knowing that in the even of a disaster within a cluster, a backend is not on the same host that a front end minimise downtimes and optimise SLA. to be strongly considered in my opinion.

DRS affinity rules.

DRS is definitely a sweet fonction. Take the time to look at http://www.vmware.com/files/pdf/drs_performance_best_practices_wp.pdf to get a better idea of the strength of the tool. 

But in blade environments, when spanning a vSphere cluster across multiple blade chassis and using application clustering consider using DRS affinity rules. Why? This prevents all nodes in an application cluster from being on the same chassis in case of failure (hardware failure at the chassis level).

Sounds silly but you need to think about that in Blades environments. You surely wouldn’t want to have all your VM to be within the same chassis (again if you are in an environment that spans across more than one chassis).

I agree it would require a large farm with 100’s of VMs. But keep it in the back of your head.You may also consider pinning your vCenter to a small number of hosts, especially in large clusters.

VMs and Ressource Pools  Hierarchical levels

Once you’ve taken the Resource Pool route (the right route I should add) you need to remain aligned with this strategy. Refer to 

http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.resourcemanagement.doc_41/managing_resource_pools/c_managing_resource_pools.html 

I have often seen architectures where a VM is at the same hierarchical level than a Resource Pool. Please don’t! 

It’s not recommended deploying virtual machines at the same hierarchical level as resource pools. In this scenario a single virtual machine could receive as many resources as a complete pool of virtual machines in times of contention.

We don’t want that. Resource Pools are not consuming resources for say; they are managing the ressources assignments. Group VMs in Ressource Pools, and as a default you may want to create 3 resource Pools: High, Medium and Low. as a start it is a good practice; group VMs in Pools and make sure they are not under the host directly in the cluster.

EVC 

What is EVC? 

Enhanced vMotion Compatibility (EVC) simplifies vMotion compatibility issues across CPU generations. EVC automatically configures server CPUs with Intel FlexMigration or AMD-V Extended Migration technologies to be compatible with older servers.

From the book, this KB is handy 

http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.vcenterhost.doc%2FGUID-9F444D9B-44A0-4967-8C07-693C6B40278A.html 

Chances are you will not purchase all hosts needed at the same time. You may have acquired a few when you first started, but now comes the time to expand and add a server.

Beside the fact that we need to remain within the same CPU family, fonctions and feature of modern sockets might have calls (SS1,2 or 3 calls for example) that are unknown to older socket generation.

Enabling by default EVC on a vSphere cluster allows new processor architecture (of the same family of course) to be added to a cluster seamlessly.

Let’s say it avoids future, unnecessary  troubleshooting steps.

Conclusion

In this PART 1 a few tricks from the field. They are not major items, but will help from time to time and surely will set the foundation to a growing business and leave you with a feeling of achievement. 

Don’t forget virtualization is here to help; adding a few things here and there will drastically enhance your confidence in your infrastructure and provide an very dynamic environment that can adapt and is ready for bursts or unwanted (but still a reality) failures.

PART 2 will cover SRM over vMSCAutoDeployRestart Priority and Management Network.

PART 3 will review Permanent Device Loss PreventionHA HeartBeatDPM & WoL and vMotion bandwidth. 

Posted in Cloud, compute power, Converged infrastructures, Datacenter, Engineering, Hybrid infrastructures, memory, Monitoring, platform, processors, rackmounts, Server virtualization, vExpert, vmware | 3 Comments

Choosing hardware for use as an ESXi host

NUMAIn today’s workloads higher demands and expectations from our business, choosing the appropriate hardware is not what it used to be.

True that a server is a server, and as I always say, all manufacturers are working within the same boundaries dictated by the chipsets. HP and Cisco for example are surely leveraging the same technologies, led by processors and RAM.

 

So how  do you choose the right hardware?

There is far more to consider than just the socket and the ram, especially in a software defined datacenter, the multiple technologies alignments are far more driving the decisions than simply the socket. It is a no brainer that today’s workloads are generally virtualized, therefore, it exist in your datacenter today, a mix of workloads that require a personalized attention.

VMware has been significantly enhancing its technology, ESXi, to support the most demanding workload, including databases such as SAP. In a world of mixed workloads, (hoping you’re there already, and if you’re not this would surely help you), choosing the right technologies to deploy in your datacenter becomes an art, where we all need to be artists and creators of predictable outcomes for our lines of businesses.

 

Here’s a few thing, among many others, that you should consider

 

First, Consider NUMA node size (ref http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html), as memory today runs slower than CPU cycles, providing access to local (or not) memory for faster processing of data is indeed a huge advantage in complex (or not) workloads. The importance of VM sizing for NUMA is huge when  dealing with sensitive workload; the performance hit in the event of wrong sizing is an expensive outcome to pay. High %RDY is a metric you should monitor or have an eye on.

For example, if the hardware configuration is a 48 core system (4 socket, 12 core physical CPU) that has 6 physical CPU per NUMA node, an 8 vCPU virtual machine is split into 4 vCPU clients that are scheduled on two different nodes. The problem is that 4 vCPU clients are treated as a granular unit for NUMA management. When 4–8 vCPU virtual machines are powered on, all the 8 NUMA nodes have 4 vCPUs each. When the fifth virtual machine is powered on, its 2 clients get scheduled on 2 NUMA nodes which already have 4 vCPU clients running on them. These 2 NUMA nodes therefore have 8 vCPUs each (on 6 cores); causing those nodes to be overloaded and a high ready time to be experienced.

You also need to look for CPU scheduling for larger VMs. The CPU scheduler is an crucial component of vSphere. In a traditional architecture, all workloads running in a physical machines are calculated through the operating system and passing the request to, say, Windows.

In a virtual environment, let’s not forget that the virtual machine, say a Windows server, is seen as a “workload”, and that “workload” has its own “workload”, the Windows services for example continuously running to enable Windows with its core components and features. While in a physical environment, the “workload” associated with the Windows services is directly handled by the hosting operating system, say Windows, in a virtual environment, the hypervisor deals with ALL workloads, coming from “within” the virtual machine (i.e the Windows services), along with the virtual machine itself.

Virtual machine must be scheduled for execution and the CPU scheduler handles this task with policies that maintain fairness, throughput, responsiveness, and scalability of CPU resources.

I have retained from my VCAP training that one of the major design goals of the CPU scheduler is fairness. That’s an easy way to remember CPU Scheduling.

Make sure as well that there is a balance between CPU & RAM to avoid underutilization of resources.

I keep hammering this point, but I do believe that virtual machines density is a tactical aspect of a day-to-day operation too often forgotten. It is crucial to keep a balanced cluster in a vmware farm and a high level density per host and for that, an important aspect of an operational activity is to ensure the investments done in hardware acquisitions are efficiently leveraged (i.e getting more VM workload on a single host, increase the ROI on that physical server. Simple math)

It’s difficult to say how many VMs you should have on a single; the main reason is simple: not all workloads are created equally. Investing in a virtual centric monitoring tool is a smart investment.

vSOM does a good job at that. I have spent many years of my career as a monitoring expert, implementing HPOV, MOM, OpsMgr and SCOM, SolarWinds….etc to help organization been proactive in downtimes, and forecast performance challenges to establish a stable environment where all workload evolve in an optimized manner.

When you try to forecast the future, like tactical monitoring is supposed to do, you need to rely on tons of metrics and you need to analyze them in a way that represents your business, interpret them so the right actions can either be planned, or immediately performed.

Finally ensure sufficient I/O & network throughput is available; that’s a no brainer I think. If you do a good job at increasing the workload on a single host, the ripple effects are higher throughput from that host at the network and storage level

VMware’s best practices are providing a good overview of what is needed in terms of port requirements for the various workloads. Management, vMotion, Storage vMotion, Ethernet based traffic and storage traffic are personalized to your environment but there’s fundamentals metrics to be respected. I strongly suggest to not deviate to far from the vendor’s expectations. For example, don’t mix “management traffic” and “VM ethernet” traffic. Seems obvious but I’ve seen it.

With the amount of ports needed, and the amount of cables required, think outside the box and look at 10Gbps ethernet card. But more precisely look at technologies that are allowing you to create virtual ports, vPorts. The Qlogic and VICs are a good start.

On the storage side, the same applies. Amount of ports Vs amount of cables required to carry the traffic is something you’ll need to consider. Creating an environment, ethernet based, that allows a seamless flow of data traffic is a crucial element to a performant environment. When you do all the work I have just described above, don’t let the storage traffic kill your efforts.

The storage traffic is as important as the ethernet traffic. Having 30-40 VM on a single host, will generate a hight traffic on your iScsi pipes; keep an eye on vmware’s metric %RDY and %WAIT to evaluate how your VM is performing (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008205). Understanding, from a VM point of view, the storage performances can be challenging, granted, so use https://communities.vmware.com/docs/DOC-9279 to help you keep an eye on what matters the most.

 

Conclusion

Choosing the right hardware is not an easy task, especially when you consider the environment it requires to compute. Far more than a simple box, the platform is a crucial element that aggregates a series of components resulting in an optimized farm.

Don’t underestimate the workload and the density; they are the key to ROI from a business standpoint, but are driving an optimized OPEX that many CFO are sensitive to.

As always, feel free to send your feedback. I always enjoy reading your point of views, and enhance my personal knowledge with the experience of all.

Wishing you a good weekend my friends.

 

Posted in blades, compute power, Converged infrastructures, Datacenter, Engineering, Hybrid infrastructures, memory, platform, processors, rackmounts, SDDC, Server virtualization, SSD, Storage, Storage virtualization, vmware | 1 Comment

Audit your Data: Take a step back, think and analyze.

ImageTaking a step back 

Now that some of the largest storage conferences have seen the largest audience ever in Las Vegas, I think it’s time to take a step back and brainstorm a little about why we were so many interested in those conferences and what were we truly looking for.

We are living some very exciting moments don’t you think? When we take the time to step out of our every day in IT, the world of data management is shaping in front of us in ways we have never considered nor imagine before. While the End User Computing is taking some times to take off, the data management model gets all the attention from the IT sphere… and maybe it’s not a bad thing after all.

Either you are passionate about technology or actually how technologies are enabling IT to drive the largest change ever in every organizations, let’s admit that raw data is at the heart of many questions that too often we are faced with in trying to find solutions for.

Sure you can buy a box that does miracles with your data, the reality is, too often, those miracles are a little shy in your everyday and quantifying the true benefits is a full time job on its own. The reality is that we are focussed on the wrong problem.

Looking at technology solutions to alleviate business problems is not a model that will sustain very long and I encourage you to extract yourself from this mindset and consider approaching the questions you have, in another way to truly add the value your leader is looking from you.

Seat back and holistically look at what you are trying to achieve, from a business eye and abstracting yourself from the feeds and speeds will often show you the path to follow, which you can THEN apply technology to.

A great article http://blog.infotech.com/analysts-angle/seeing-the-world-through-the-ceos-eyes/ form which “Not taking leadership over business processes nor translating them into a technical strategy that will produce business results” and “true innovation can only occur when IT has a firm understanding of business challenges” resonated in my head.

 

Think: The reality of data management

Managing data has never been so complex; “managing data” means *understanding* the requirements of servicing SLA in an application centric model: it requires a deeper knowledge than just reading a “specs sheet”.

Our datacenters are getting overwhelmed by various source of data (http://shar.es/V0Irp) and you’re tasked to “box” in a predefined architecture and frankly I’ve seen magiciens in my career; Between tiering data (http://searchcloudstorage.techtarget.com/definition/automated-data-tiering), extending the cache of controllers or moving hot blocks outside the array, the agility of technology has allowed many of us to be highly successful in addressing all expectations, at an expensive price: OPex.

Managing these requirements in an operational model is another story on its own.

Analyze: Understand, Modelize and Execute

We all need to take a step back and rethink how we approach our projects. Analyzing is an on-going activity; it allows an on-going review of the data model and how business expectations have changed. Is it mobility? Is it archiving? Backup? Virtualization? VDI?Whatever needs, remember that data follows business, and business moves forward, therefore data change so is technology: how you will truly understand that data is crucial.

First, we need to *understand*, Second, we need to *model* and Third we need to *execute*.

Understanding requires careful actions. Metrics are endless, and gathering them, while a fundamental element, cannot be done without an expert touch. Tons of utilities will allow you to get these metrics. Thousands of technologist will gladly help gathering metrics. How many can do something valid for your unique business? Understanding requires an #expertise and an #experience that are only found in few #integrators that are able to #simplifyITcomplexity.

Modelling metrics on its own is an art, period! Been able to have the credibility to look at data and its metrics in a 3 Dimensional aspect does not mean to match a solution to a result. It allows a fresher look at what drives the #Innovation the business is trying to achieve through the data model and matching the current state, based on the past to shape the growth.

Executing requires “savoir-faire”. ONLY THEN can we really understand the value of the technology. ONLY THEN can we really quantify the derived values of IT in an organization. Not being biased by the information we are day in day out bombarded with, truly is, what differentiate one IT #integrator from another. Why one technology more than another? The understanding and modelling activities will lay the most #optimal path to long term predictable IT service. Predicting service, opens the door to #Innovation and refreshes business offerings through new opportunities.

In a nutshell, starting by looking at the nots and bolts will eventually lead to a miss in expectations.

Conclusion

We can extrapolate this post to other areas of data management, or data movement. Data is stored, optimized, and travels from the datacenter on the network to the consumers.

Getting the right audit and auditor for your data is the next level where your talent will shine. Getting the right talent in front of your needs and delivering on it is key at walking the path many talk about, but few execute on.

*Understanding* the true data value of an organization is a talent that few are able to articulate and demonstrate through a deep understanding of the business. It requires far more than just talking about a hardware technology or try to “fit” requirements into “boxes”, it needs to be deeply analyzed, reviewed, mapped to your unique needs and integrated seamlessly into your existing ecosystem.

Many can collect metrics. Few understand their value. Less can truly *integrate* them.

 

Have a good weekend friends

Florent Tastet

http://about.me/florenttastet

Posted in Audit, Backup-Restores, Big data, Cloud, Datacenter, HealthCheck, Services, Storage, Storage virtualization, structured data, unstructured data | Leave a comment

Are we failing the true business needs? Part 5

The EUC Pillar

Image  When I’ve started to write this series of blog, I had one idea in mind, and I’ve tried to outline it through the past blogs. The series was not intended to be very technical, but more to show a high level of the datacenter pillars and how in fact, too often, the end user community, primarily responsible for most of the changes existing in the datacenter, are unfortunately not entirely in mind.

https://florenttastet.wordpress.com/2014/03/30/are-we-failing-the-true-business-needs-part1/

https://florenttastet.wordpress.com/2014/04/06/are-we-failing-the-true-business-needs-part2/

https://florenttastet.wordpress.com/2014/04/13/are-we-failing-the-true-business-needs-part3/

https://florenttastet.wordpress.com/2014/04/19/are-we-failing-the-true-business-needs-part-4/

Way to often I meet with Directors and C-Level and realize how little the end users are discussed during a storage conversation with customer’s trusted advisor. Too often it feels that most of the Presales conversations are focussed on a precise angle of a requirement in the datacenter, and you might be lucky to find someone capable of articulating a larger conversation; if that’s your case push it to the next level.

However what I do see as well too often, is the fact that an entire aspect of the largest complexity is too often left aside, or the conversation deviates from its true purpose.

As the model of IT is shifting to a user centric model, our minds have to think differently, and really investigate the true need of the end users, as they are the most complex aspect of our every day.

Sizing and architecting became a much easier and accessible task to perform and aligning the right technology is far less obscure than, let’s say, 7 years ago.

However, adequately aligning the end user expectations requires far more understanding of the organization DNA to be able to make the right choice, align the right solution and measure the outcome, simply because we are humans.

Today we will be talking about the EUC, or “End User Computing”, one of my favourite topic, as it touches not only the datacenter, but the desktop and the applications, and aligning those three is a true art.

The Desktop model

Welcome to the most massive shift ever! XP lasted 10 years. Now what?

Windows7 and Windows8 have started to gain momentum but haven’t truly pierced nor lead a change… why? XP represented the most stable model of Microsoft Operating since Windows 98 and represented a natural evolution of the end user computing as we knew it and so organization didn’t see any challenges in implementing XP to replace Windows 98 in the Enterprise.

All right that was easy!

However 10 years is long, and the new reality has caused many headache and Microsoft understood it by pushing over and over the deadline of XP. But now we are faced with a decision to take and need to ensure that we do the right now based on a few requirements that can be surrounded by a single phrase:

IT Consumerization (http://searchconsumerization.techtarget.com/definition/IT-consumerization-information-technology-consumerization)

The Consumerization of IT has led to a phenomenon called bring your technology (BYOT a.k.a, BYOD, BYO etc..), in which companies allow employees to access corporate systems using their own personal devices, leading to mobility and security concerns while driving data growth at a rhythm unseen before.

So we are we in the desktop model as we know it? As applications are getting packaged to be mobile and follow the users on the BYOT journey, the model is harder and harder to evaluate. While the past showed that IT was able to service 100% of an end user community by a stable and standardize desktop model, this model was transported to the datacenter in a form of virtual desktops, and those same virtual desktops accessed by a variety of device that IT had or have no idea about.

Microsoft has felt it, VMware and Citrix have built the path: seamlessly integrated with user’s preferred device, the desktop need to be virtual, Windows7 or 8 no longer matters and this is where the largest change in the desktop model has changed.

Choose one OS, the one you feel the most comfortable with, and it will be delivered to your preferred device, seamlessly integrated.

The Application model

Aside from the desktop conversation, the application model is one of my favourite conversation. Why? Because it has completely changed and it truly drives and support the profitability of any organization but I have WAY to often heard nightmare conversation about it.

Let go back a few years ago. A typical application was installed locally on a device and connected (or not) to a backend database. The model was lean.

While often affected by changes in the operating system, it happened from time to time, that this application was no longer working, and unless you were one of the programmers of it, we could say: “and only God knows why”

As frustrations piled up and profitability went down, manufacturers realized that if they wanted the desktop model to evolve to a portable one, they need to address the application side of the end user computing, and the main reason was that end users no longer were concerned by the device they used as long as the application they accessed was available from the device they needed.

It was pretty quick I would say; and much like the desktop, promptly we have seen the applications following the model of the desktop and become very portable.

Now let’s set the bar: any application requiring a backend database will still not be able to function if there is no backend network access. We know that right?

But interesting enough, the applications become so important that they have stopped the evolution of the desktop model to a virtual one. And I like that! Because we have finally reached the point where we realized how important was the applications and the users consuming it, and has the IT industry tried to shift some paradigm, the community has raised the flag and helped pointing into the direction where they needed help: the application portability.

However, this is causing far more damage to the desktop than I think was expected.

I don’t believe that the traditional Enterprise level organization will leverage 100% the application portability to eliminate the traditional desktop model. Yes, the Enterprise level organization MIGHT leverage a form of BYOD, but I don’t think we are at the stage where anyone will bring any device they like to work on the corporate applications.

Why? hum… because we are human, and need to absorb and process the change, that’s all

But I do believe that ANY organization of reasonable size, can easily open the doors to a BYOD, assuming the employees wishes to follow of course.

What I’m saying is the technology has addressed two big challenges: how do we changed the desktop model as we know it and how do we change the application model as we know it?

The virtual desktop has provided alternatives and integrated the solutions seamlessly; I know for a fact a client that didn’t know he was on a virtual desktop at the office for months!

The application model has evolved to a model where it does no longer require to be aware of the underlying operating system. It can move from one OS to another, from one device to another, seamlessly.

Conclusion

Have we realized the tremendous changes driven in the end user computing? I believe the goal to allow a highly portable end user model has arrived, and while this allows a new angle of profitability it truly unlocks large amount of potential in both operation and capital expenditure.

True that the backend datacenter requires to keep up. True that part of the investment is required in the datacenter to allow this highly portable model to happen, but if we’re able to articulate the opportunity, no longer will we have to worry about the end user computing side of the business.

Give me an OS, give me the device I want, and I will be able to work, any time, anywhere.

Anytime, anywhere, from any device is a brand that needs to shine among our organizations, as we need to focus on the future and the growth of profitability, no longer can we sustain the challenges experienced in the end users field.

I love what we’re going through. I hope to see it reach its true potential.

 

Posted in Application portability, Application virtualization, Big data, Citrix, Cloud, Datacenter, Desktop virtualization, End User Computing, structured data, unstructured data, Virtual desktop | Leave a comment

Are we failing the true business needs? Part 4

computerThe Platform Pillar

This is the fourth blog on “are we failing the true business needs”; what we have seen so far is that many conversations are not precisely focused on the end user computing side of IT while in fact the end users are directing more than ever the changes we are seeing in the datacenter through the consumerization of IT.

While the past showed that IT was directing how the end users community needed to work, those same users’ hard work has forced IT to rethink and reshape the way they have been delivering IT, helping IT in the process to shift from a cost centre to a strategic enabler of the Lines of Businesses. We should be thankful to them.

And the work is still is progress, as the datacenter gets redefined through technologies, today’s small to enterprise size organizations can have access to the same benefits from the Information Technology world. The ultimate goal is to provide access to applications faster, enabling users to become innovative and creative, faster, ultimately making them contributors to the organization in a shorter period of time.

We have seen that the datacenter operating system, a.k.a virtualization, led by VMware and closely followed by Microsoft, is unlocking the agility of application provisioning and consumption, ensuring in the process, that Business Continuity and Disaster Recovery are no longer a concern. https://florenttastet.wordpress.com/2014/04/06/are-we-failing-the-true-business-needs-part2/

While the datacenter operating system, makes applications more reliable, the underneath storage foundation is also getting aligned with the virtualization layer, through a set of data management capabilities, resulting ultimately, in a faster and more secure way to access the data consumed and created by the end users. https://florenttastet.wordpress.com/2014/04/13/are-we-failing-the-true-business-needs-part3/

Remains in the equation, the compute power, or better said, the platforms a.k.a the servers.

The computer power

Let’s set the table. None of the manufacturers have access to a technology out of this planet.

What I mean by that is, all processors and memory chipsets are coming from the same sources, and while each platform vendor may have different ways to leverage the technologies, fundamentally, a processor is a processor, and a memory stick is a memory stick… oups.. i said it.

In the server world we live in, and as far as the x86 business is concerned, two major players have always been around: Intel and AMD. Luckily for us, the low level of engineering behind it hasn’t changed. The wheel is still round, and still turns in circle.

What has truly changed is how the wheel looks and how it enables data to be compute. It might sounds simplistic but this is what we are dealing with, which probably  supported our feeling of compute power being a commodity more than anything else.

And this exactly where we are all mistaking.

Agreed that a server is a box with a processor, ram, busses and a set of peripherals connecting it to the rest of the world. Granted! But platforms are far more than that in by eyes. They are the key element that makes the data we consume, consumable. Through a complex interpretation of “1’s” and “0’s” they are translating in human language the data we are creating or consuming. For that, I don’t consider platform to be a commodity.

Often I think about us, humans, and how often we forget how important is our brain. Concretely we are able to quantify the outcomes or our hands, legs, feet, eyes, ears…etc but we so often forget that without the central processing unit, our brain, none of these would be real, and none of them would make sense, none of them would be orchestrated and driving to a successful outcome: interaction.

The processors are often the unsung heroes of the modern era and each year we benefit of faster and more efficient performance which improves not just computing, but also numerous industries and it keeps unlocking the ultimate feeling we have about technology and the responsiveness of it.

The differentiators came with cores and how processors were able to have, within the same chipset, various cycles, or better said, were able to segregate compute cycles into smaller waves of calculations. With today’s 4,6,8,10 and 12 cores what we need to see, is the continuous need for parallel processing to address the most demanding requirements from the most demanding applications used by the most demanding customers: the end users.  (http://en.wikipedia.org/wiki/Multi-core_processor)

In a very simplistic way, and please don’t hold it against me, Cores are one “CPU”. So a dual core processor has the power of two processors in one. A quad core has the power of 4 processors running in one processor. And so on with 6 and 8 cores. Just think of a quad core having the power of 4 computers in one little chip. The same with dual cores.

Businesses are relying on faster calculations, faster access to stored data and ultimately faster computation of that data for a faster outcome: profitability.

Fast and Faster are two terms we are familiar with in the platform industry and what differentiate one another. So many flavours exist out there that we are fortunate to pick and choose what makes the most sense for our needs, or the needs of our end users expectations and organizations growth. I like to say that while the datacenter operating system or storage are a key element of a balanced IT conversation, none of them would play such strategic supporting role if the compute power would not be able to follow.

Commodity? I disagree. Valuable resource? Sounds much better.

What differentiate one versus another

What makes the difference between one vendor versus another is the ease of management, and the integration in our datacenter’s strategy. We are seeing a huge momentum in the datacenter through a complete Software Defined DataCenter, and we are privileged to witnessed the platform outperforming the most demanding requirements. That is not something to forget!

Now the true challenge is the integration alignment and how it will make our life easier.

Why? Well, simply because we want to be focussed on more beneficial results for our organizations through the delivery of new projects, and unlock invaluable time and financial capabilities that we can inject in the growth of our organization through new projects, hence new profitability opportunities.

Often we do not think about quantifying the time and dollars attached to our day to day, but our beloved upper managements a.k.a C-Level, do, and being able to invest in growth instead of “fixing”, makes it more appealing to any leader and investor, and we surely want to be a key element, through tactical activities, of those savings.

The rackmount format is the most available. A few “U’s” high each, it allows a cost effective way of compute power. In its form, and at very simple levels, the rackmount, is a single player. What aggregates it together is the hosted operating system through clustering technologies. It may play a crucial role in a farm architecture, but is still causing challenges when comes the time to manage it, and you can easily expend that though to the network requirements through ports and cables.

In the event of a virtual farm, the “101” rule applies: make sure you have all the same components inside: same processor family, ideally identically ram quantity. Rackmounts is the simplest form, yet most cost effective, of compute power enablement and typically are far more seen as a foot in the door for a starting project and a larger strategic alignment in the datacenter.

Where this format hurts is when you reach the level of high amount of servers, and while there’s tons of solutions and alternatives to effectively manage such deployment, it remains that fundamentally they are not typically integrated with the architecture deployed and are too often a glued solution to help alleviate the larger challenges of management and efficiency.

Truth is that rack mounted servers will always be very compelling on the short run. The costs aligns well with tight budgets, but it comes at a discreet price: OPex.

The long term might impact the time it takes for you to deliver an “as agile” compute power as you have been deploying in your datacenter around the datacenter operating system or the storage pillar. Business growth and forecast will help align what best fits the needs, but ultimately keep in the back of your mind that managing a large amount of individual servers is far more challenging on the long term than unlocking a financial capability on the short term than injecting a form of aggregated compute power solution (read Blades).

Fundamentally I like to remain aligned with the rest of the datacenter’s architecture. Balancing investments and remaining true to my needs are far more compelling than saving dollars on the short term. Chances are I will remain locked with this decision in the future and knowing (or not) what’s coming, might alter the way I see server’s investments. But I am not a CFO, nor a CEO.

The Blade format, is far more integrated and far more efficiently manageable if I can put this way. The challenge was to manage  this amount of compute power, delivered through an aggregation of platform in a very small form factor format. The platform leaders have deployed a tremendous amount of energy to make it simple and acceptable in any organizations. When you think about it, in a 19 to 40 U format you’re able to fit between 8 or 16 servers. Compelling!

Now the management of Blades infrastructures has been incredibly improved over the years. Simply because we didn’t had time to individually manage each server, the truth is the complexity of such architecture required the right tools to enable an efficient management of an aggregated compute power architecture and the true agility to move workloads form one server to another transparently to the application and the end user without relying on the hosted operating system.

So enabling the ease of management through centralized management console helped defined a much more agile need IT is servicing, to efficiently address the growing demands, while maximizing the time and efforts for it.

The commodity comes with the ease of management and true to a certain point, we don’t want to be disrupted by the need of managements that server require.

If that how we position “commodity” I’m ok with that.

On the flip side, the end users are continuously demanding more processing power, and having a very light and agile architecture definitely helps meeting those demands, quickly. Having a very agile compute power strategy that aligns with the underlying investments (read storage and virtualization) should be directing our thoughts when taking a direction that will eventually result in a very balanced architecture that everyone can benefit from.

The Blade management should be allowing a central, unique, point of management. Having to individually manage each chassis might just transfer the challenges of rack mount management into the blade architecture.

Conclusion

We are seeing the end users forcing a shift on the compute power side of the datacenter’s equation.

While inside the box they all offer the same different flavours, the true differentiator between one another is the management attached to it. We have worked so hard to bring agility to the datacenter and staying focussed on that aspect is a crucial element of any decision.

That you are or not making a decision or considering a change or an improvement, some very simple and key elements need to found answers:

  • Fact is that end users will not stop creating demand.
  • Fact is data will require to be faster computed and faster accessed.
  • Fact is data compute should not be a bottleneck and should be flawless the rest of your architecture.

Addressing these key questions will eventually lead towards the right solution that best fit your requirements.

That you consider rackmounts, or blades, you should not be afraid by either the technology or the cost of it, but focus on the long term in compute requirements and if you’re overwhelmed by the digital content, ensure that your compute companion will be an enabler.

Posted in blades, compute power, memory, platform, processors, rackmounts, SDDC, Server virtualization, vmware | 1 Comment

Are we failing the true business needs? PART3

 ImageThe Storage Pillar

The datacenter operating system discussed last week (https://florenttastet.wordpress.com/2014/04/06/are-we-failing-the-true-business-needs-part2/) is da strong pillar of business requirements and now more than ever as datacenters are almost entirely virtualized from the servers to the end points, and while at it, applications.

The efforts deployed to solidify the foundation of tomorrow’s IT reality are paying off, as the true business needs are around the end user community; we are able to witness a strong focus on their needs through the technologies made available and are starting to feel the results: users are more mobile then ever, their accessibility to applications far more agile and those same applications way far more responsive than, let’s say, 5 years ago.

On the storage side.

Call it vCenter, SDDC, Hyper-V,  SCVMM, azzure, vCHS, Amazon, private of public cloud or any other form of aggregated compute-storage-application a.k.a “IT As a Service”, solutions are unlocking a complete flexibility of business needs.  We all agree that without virtualization at its core, our datacenters and businesses wouldn’t be as strong as they are and we wouldn’t be achieving this high level of application portability as we do today, resulting in a much happier end user community in demand of performance and accessibility.

I also mean “strong” in two ways: 1) it’s one of the most important element of a successful business strategy in need to redefine the way IT plays its role in the organization 2) we should not forget how much “invisible” visibility it has to the end users.

As it stands today, (and we all know “tomorrow” can be different), the datacenter operation system, relies on different layers to enhance the various derived values of the impacts it caused. Today we will be reviewing the storage pillar of the datacenter and how it handles the various types of data he’s asked to manage, handle, store, retrieve, optimize and align.

Structured or unstructured

Fundamentally, centralized storage arrays are storing data in blocks and these blocks are occupying sectors on our disks. I don’t want to go too deep on this subject but we all understand what I mean; this wiki can help http://en.wikipedia.org/wiki/Disk_sector.

Aggregated, these blocks of data are forming datagrams, and once “retrieved” by the disks heads and reassembled by the controllers, these datagrams are presenting the information we consume, by the mean of an enterprise application, or a simple word document. Regardless of the nature of the data, it requires to be stored, and accessible, when needed, and its nature should not be a concern.

Data accessibility is at the core of today’s most demanding applications or end users, or both. Essentially two types of data exist: structured and unstructured.

Take for example the various data we are creating almost every day now: word documents, plain text files, templates, powerpoint, excel…etc in short, all of the end user’s data, we are creating sets of unstructured data that all be growing exponentially in the next 10 years (in facts many researches says that it will explode to about 40ZB by 2020… that’s in 6 years…. see Ashworthcollege blog http://community.ashworthcollege.edu/community/fields/technology/blog/2012/03/13/dealing-with-the-explosion-of-data)

So this growth is around us and requires to be managed and evaluated, organized, accessed and reliable as it remains the blood of any organization.

On the other end, the structured data. Primarily formed by anything related to database, proprietary file system, such as VMFS, NTFS, NFS…etc this aspect of the data is, although not at the same space, forming what today’s arrays need to deal with, and what tomorrow will look like for data management.

The structured data

As we are talking about the storage datacenter pillar, structured data is our concerned for this blog. Not that unstructured is not important, but it would not follow the spirit of the blog. I will come back later to it in another blog.

So structured data is what we’re dealing with when datacenter operating system is concerned. It forms the largest concern when virtualization (or any form of databases) is discussed and while the end user community is unaware of this, they are daily contributing to its growth by interacting from different angles with it and pushing the limits of the tools made available to them.

That we, as IT professionals, are providing a direct access to underlying disks through block based protocols, disk passthrough, RDM or any other form of disk direct access or are leveraging file based protocols, the underlying arrays controllers, need to manage the requests sent to them to retrieve the structured data in a very timely fashion.

We, as end users, are not really concerned about where the data is stored or “how” the data is stored, retrieving the information it contains is what we are concerned about and moreover how quickly this data is made available to us.

Say for example, you are with a customer on the phone and are required to enter data in a CRM system, you surely don’t want to be telling your client: “that won’t be long, my system is slow this morning”. Although understandable, it is often an uncomfortable message to deliver and we all feel under performing when this situation occurs… or worst irritated.

Same applies when you’re trying to retrieve data for that same database; the last thing you want is to be waiting for the system to provide you with the data you’re looking for, because the array is not able to aggregate the blocks of data fast enough to your expectations or the controllers are saturated, busy dealing with other requests from other systems.

You can now imagine how this would feel when, instead of a database system, you’re dealing with a virtual machine… but it could be even worst… picture that same database you’re working with, hosted on a virtual machine.

Structured data are one of the most encountered type of data end users are helping grow every day. Take a peak at http://www.webopedia.com/TERM/S/structured_data.html for a high level view of structured data

Virtual machines are part of the structured data growth and are heavily contributing to the challenges we’re all facing day-in day-out. We have sold to our management how efficient virtualization would be to the business, it is now time to deliver of these words, however, virtualization is as good as the underlying storage and how that same storage will be responding to our requests.

VMDKs or VHDs to only name these two most popular, are in fact residing on a file volume (either VMFS or NTFS) and these volumes require to be precisely stored onto precisely architected arrays RAIDs to perform to the level they are expecting to perform. Ever heard about partition alignment? Not an issue with Windows 2012 server, but it has been for long a source of concerns.

I believe that structured data is the most challenging aspect of architecting a reliable storage solution, for the sole reason that it is crucial to understand HOW this structured data will be accessed, when it will be accessed and definitely how often and by whom (read computerized system).

What arrays are dealing with

Sequential data access, random data access, unstructured data, structured data, performance, cache, cache extension, IOPS, RAIDs …etc are only a few of the most challenging aspects of data management arrays are requested to work with.

As the end users are consuming the data, creating data, modifying data, various forms of data access are daily managed and orchestrated by storage arrays. It requires a precise understanding of accessibility to everyone (read computerized systems), ensuring that every request is treated in order, respecting a tight SLA.

SEQUENTIAL data access is the simplest way of projecting how data will be accessed (http://en.wikipedia.org/wiki/Sequential_access). Defragmentation is surely a word you would need to master to remain on top of sequential accesses. ensuring that every blocks accessed are in sequential orders minimize the time controllers are waiting for the heads to pull the data out of the disks, resulting in faster datagram aggregation and presentation to the requestor (read computerized system)

On the opposite, RANDOM data access is the toughest (http://en.wikipedia.org/wiki/Random_access). When you don’t known which data or block will be accessed, it becomes a challenge at projecting the performances it requires and typically you could find this challenge in structured data, at the data level itself.

PERFORMANCE is a vast discussion but can be generally covered by IOPS, Cache, Controller memory and SSD. Often we would see “IOPS” as being the primary driver for performance. Right or Wrong, it drives a large amount of the outcome for applications. A single SAS drive, generates about 175 IOPS.  Once aggregated to others using a “RAID” (Redundant Arrays of Inexpensive Disks) technology (http://en.wikipedia.org/wiki/RAID) you will find yourself, in the case of a “RAID5+1” with, say 5 disks, with an outcome of 175×5 = 875 IOPS (take out the RAID parity for the performances). Not bad. Now multiple this by 5 or 10 RAIDs, and the total outcome is now 4,375 to 8,750 IOPS.

Cache, Controller memory and SSD have tightly worked together lately to enhance the performance expected by the most demanding Tier1 applications. Very successfully it allowed IT department to rely on the various benefits centralized arrays have to offer and often we’re seeing this has the second wave of application virtualization.

Just thinking about a single flash disk driving around 3,000 IOPS (http://www.storagesearch.com/ssd-slc-mlc-notes.html) and the performance picture is brighter all of a sudden.

See http://www.tomshardware.com/charts/hard-drives-and-ssds,3.html and http://www.storagesearch.com/ssd-jargon.html for more details on the drives and their performances.

While the hardware performances are “easy to forecast”, the overall performances are far more challenging as they require to take into consideration a multitude of angles, such as, yes IOPS, but also Controllers memory, data retention and “type” of data (hot or highly and frequently accessed, cool, averaged accessed data or cold, little accessed, ready to be archived -almost-)

In the case of virtualization, the data accessed is typically almost always “hot” and here comes the challenge as the controllers and many arrays were NOT build for such requirement.

To address this challenge you’ll have to look at the data at the block level:

which are the blocks, composing a structured file, the most accessed, and where these blocks should be hosted (stored) to maximize performance and storage but moreover, where the blocks composing that same file, the LEAST ACCESSED should be stored to minimize the capacity and performance impact they may have while servicing the rest of the datagram properly to not impact the retrievable SLA required by the application requesting them.

I know it’s a long phrase… but it says it all.

This is where the interesting part of a storage array supporting a virtual farm is. All arrays are dealing with data management in the same “manner”; they store the data, and access the data. Now when this data has a unique requirement, a unique need, a unique purpose, this is where arrays are getting differentiated.

In conclusion

The end users are again, in the storage pillar of the datacenter, creating a constant opportunity in data management. We, as end users, are growing our needs and are not ready to compromise, because why should we?

When storage are architected, we often focus on the hosted application, but rarely are looking at the user’s expectations and systems requirements. We all ensure from all possible angles, that the application hosted on the array will service the end user adequately.

Wrong!

We might take the right approach by looking at the applications metrics and seating the data (blocks) where it will be best serviced, but as systems and data are accessed, this will change, and our agility at providing what the data needs when it needs it, will sets us apart when the big requirements will hit our architectures.

It became almost impossible to project the future of how the data will be accessed, or how the data will be performing. The reasons for that? Us, the end users, and how strong we are at changing or forming new opportunities.

We, as a species, have demonstrated over and over how we are strong at pushing the limits, and something stable today, can become unstable tomorrow simply because we asked for more and are doing more.

We need to adjust automatically and stop looking at “how” we should do it, but “where” it can land, and agnostically speaking, this is the real challenge when we’re architecting a solution for a business need.

The future is bright, let’s make sure our architectures are forecasting something that can’t be forecasted!

Last question, but not least: how do you backup that data?.. LOL that’s another topic, but don’t forget about it.

Have a good weekends friends.

Posted in Backup-Restores, Big data, Cloud, Datacenter, End User Computing, SDDC, SDS, Server virtualization, SSD, Storage, Storage virtualization, structured data, unstructured data, Virtual desktop, vmware | 5 Comments

Are we failing the true business needs? PART2

The DataCenter Operating System Pillar

In the first post (https://florenttastet.wordpress.com/2014/03/30/are-we-failing-the-true-business-needs-part1/) I spoke at high level about where I believe some if not most of the conversations we’re having are in fact missing the true business need: the End User Community.

I see way too often SME, SA, TA focussing on a PreSale conversation about 1 or 2 pillars without circling back to the fundamental of the project, the end user.

In this post, I will cover how the DataCenter Operating System has fallen short for some time at meeting that same objectives, but has deploy tremendous efforts at regaining the lost miles, and where there’s still need to improve.

 

Where we were to Where we are

We can’t hide the fact that the Datacenter has changed. From a typical archaical physical deployment to a well aligned and managed software defined datacenter, it has enabled organizations achieving a higher level of operation through an aggressive reduction of compute power needs. At some point, 8-10 years ago, the momentum had started and we’ve all seen the outcome shaping quickly.

From 100’s of physical servers hosting one single application, we saw that footprint reduced to a factor of 5 (sometimes more), where we have been able to host 20 virtual machines (or more) onto a single server. This momentum allowed the highest expectations of High Availabilty and virtual servers move without application disruption a.k.a “vmotion” or “xenmotion” or “Live migration”.

So these technologies have allowed a workload to be highly available in a cluster, and permit business continuity while helping to move to the next level: BCDR

Great!

The BCDR (Business Continuity Disaster Recovery) is where we fall short, and more on the “BC” than the “DR” part. In a typical cluster, you would have to follow very strict rules of compute aggregation (same configuration across all platforms) and storage alignment (don’t rely too much on SATA) to meet the highest demands. Regardless of where you stand as an organization, you always will need to have a very agile compute alignment, and technologies such as Cisco UCS FI (http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-6200-series-fabric-interconnects/index.html) NEED to be considered, mainly because there’s still need of physical “commodity” compute platform interchangeability and we can’t fall short on that aspect. When problems hit the fan, you need to have all the right technologies to return the service back to where it was prior to that crisis.

All clusters need to be balanced, and all clusters need to have a precise acceptance level; this means that you always need to leave some room in your compute architecture for “HA”; you need to calculate (HA helps with that) how much room is required in a HA architecture to accept 1 or 2 host failure AT ANY GIVEN TIME. That will limit your virtual workload per host and increase your physical ratio. So you will need to consider a technology that enables compute addition and integration into an exciting cluster without disrupting the virtual workload.

We have also seen the architecture been enhanced by vDS to allow a seamless networking definition across all hosts on the datacenter, and all that to ensure a peaceful  server migration.

While I believe that vDS is driving a strong change, we have not been leveraging its full potential for the applications consumed. The vSwitches did, and are still doing, a wonderful job in smaller deployments. But you need to be extremely diligent when you create your cluster and manually replicate your vSwitches designs. We wouldn’t want a virtual machine migrated, unable to find its vNetwork right? … and its NIC…

So if you are concerned by the outcome of all this and the “Business Continuity” aspect of your mission, we need to consider 2 important aspects: 1) the platforms needs to be agile in a sense that you need to be able to do anything, any time (adding, removing, redeploying) 2) the virtual network needs to follow a strict ethical deployment across all hosts unless you’re privileged and have access to vDS.

You may also consider Host profile, if you can. I like it. Much like “Ghost” did miracles in the past, “Host Profile” ensures that you have the same configuration across all hosts. Practical when you need to look at more than 5 hosts, and surely unlocks the standardization of the most demanding configurations.

We are in a privileged place where applications hosted on virtual servers can be seamlessly moved from one host to another, from one datacenter to another without a long service interruption, resulting in a higher datacenter up time (yes, the 5 9’s); Consider however, the fundamentals in your designed and how it will impact the users and responsiveness of the applications consumed.

The focus of some and the gap of others

What some have forgotten is that these changes, while improving the datacenter experience and business continuity, in fact were driven by the end user community and application accessibility requirements.

True that application where always highly available when needed (think about clustering technologies for example), but they came at a cost and were tightly analyzed to calculate the true benefits; some felt short of support and were placed aside, much like if that business requirement was not essential or business critical.

Big mistake!

And why is that? While the dollars always talk, today’s requirements in application accessibility are demanding a high level of accessibility regardless of the impact it could or not have on the business. In other words, when your co-worker is still working on his deadline, why yours are stopped? And for many years, excuses were flighting high, but as newer generation came onboard organizations, this situation was no longer acceptable, and frankly, regardless of the business outcome, none of the C-levels wished to have their workers seating and waiting. Kind of unproductive you would admit…

So came the high availability and helped organizations through the virtualization adoption, to enhance the level of productivity from the least business critical application to the most demanding one. Interesting fact and shift don’t you think?

The ease of management came as well to add some weight in the balance. No matter how good you are, there’s only so much you can ask the technologies to do for you, and not because you can’t make it happen, because the dollar talks. So now imagine how we all felt when “vmotion” (to name only one) came in. Big LOL in the system engineers community. F.i.na.l.l.y were we able to offer a unified level of service to ALL users and the apps they needed to feel productive.

 

So where are we? Where’s the gap then??

Well, I would say we have started an interesting journey all together. While the datacenter now offers a seamless application accessibility experience, through HA and VMotion, remains the portability of application, a.k.a SDN, in the datacenter operating system space.

Have we just reached the end of the road? I don’t think so.

We have seen many clusters getting build, many inter-cloud strategies been deployed (primarily to address the processing burst and accessibility of applications) but I still do feel that we are not addressing the true conversation in the datacenter.

We can move applications from on host to another; we can somehow, migrate an application from one datacenter to another using replication, but we are falling short of device accessibility to all these application still yet and the portability of the applications still is at the beginning of an exciting journey.

I face this very frustrating situation every day where I can’t access the application I need, natively, from all my devices. Yes we can always fall back on Citrix to bring the user roaming experience, but it still requires a layer unneeded and frankly all applications were designed to be accessed by a mouse. When you’re like me, a little outside of the typical IT guy size, trying to click Outlook “send” button with a finger the size of a toe, it quickly becomes irritating, trust me. Imagine trying for save a document now…

Ok I hear you in the back chatting about Trinity,  and I agree with you: Trinity is there and we should be far more concerned by it, or less concerned by the “toe” situation. But are we, as a community, there yet? no. I still can’t have it even if I wanted to: I don’t have the infra in my basement.

So we have addressed in the last 10 years application migration, BCDR, and provided the platforms with what they needed, but we’re still missing the next big thing: can I access an application, natively, from ALL the available devices? And the answer is too often: no!

Conclusion 

Luckily all organizations are not yet allowing all devices. Thank god! But the day is coming and we need to be ready for that!

Will the cloud help in that? Many seem to believe so and the best example is SalesForce.com I have been using it for over 2 years now and I have to admit that I am impressed. I freely used it from an iphone, ipad, laptop, android and the application is always responsive with the same level of agility and performance: good job!

How many other applications exist out there that organizations are heavily relaying on that can’t still be accessed? And what are the true alternatives out there? And more important, WHEN have you had, last, that discussion?

The world of application portability is getting defined, and Horizon, Xenapp, Xendesktop, App-v are leading the charge. Let’s not forget that the outcome of all investments is for the end users that we are trying to keep in our organization because we have the coolest ways of working!

Have a wonderful weekend friends.

Posted in Application virtualization, Cloud, Desktop virtualization, End User Computing, SDDC, vExpert, vmware | 3 Comments