Read: What Does it Take to Run VCF 9?

I have had so many conversations around this topic, I wanted to capture my thought process here, and avoid injecting operational or performance risk, as we take our first steps with VCF.

So, let’s dive into the current offerings from Broadcom. We are only going to be discussing their primary offering….VMware Cloud Foundation (VCF).

VCF, which is what Broadcom leads with, and…VVF (VMware vSphere Foundation), which exists, but is not the topic of this post.

We are not talking about every single SKU or offering they have, as there are quite a few additional “add-on” licenses that exist for VCF. None of these add-on licenses are being discussed in the post, in any way.
BUT, you SHOULD discuss “add-ons” as you may want/need some.

And don’t forget. vSphere 8 End of Support is October 2027. That’s only 19 months away….start planning your approach now! The only way to vSphere 9 is VCF (and potentially VVF).

For the purposes of this post, we are going to sit in the role of an enterprise architect, who has to size new hardware required for a new application stack for the business that drives the business, same as if we are having this discussion about rolling out SAP, Oracle E-Business Suite, or PeopleSoft. We wouldn’t want to introduce unnecessary risk for those applications, right?

What is VCF?

So let’s start with what VCF is, and what it takes to deploy and run VCF. VCF is a private-cloud platform. You do not get to pick and choose the individual components.

Just like when you go to buy a car….if you want a moonroof, and it is only available in the “touring edition” package of the car, you get the touring edition. Don’t want the heated steering wheel or seats? Well, they came with the touring edition, so you can either use them or not….but they came with the touring edition.

You don’t have to use EVERYTHING that came with the touring edition, but I’ll bet you appreciate those heated seats on nights when its 7°F.

What Does VCF (The Private Cloud Platform) Give Me?

Essentially, all the same capabilities you get (and expect) of a public cloud provider (AWS, Azure, GCP…any of the hyperscalers). It is a platform to run VMs, containers, K8s, workloads, VPC networking constructs, monitoring, troubleshooting tools, and automation/self-service you can build. Also included is logging capabilities, insights into your network traffic flows, workload mobility, SSO, etc.

What Makes Up The VCF “Application”?

Let’s list this out (you’ll see VCF in front of a bunch of the products that you might remember as vRealize, which got rebranded to Aria, which is now prefixed with VCF). Here are the 14 “components” that comprise VCF:

  • VCF Operations
  • VCF Operations Collector
  • VCF Operations Fleet Management
  • VCF Operations for Logs (used to be Log Insight)
  • VCF Operations for Networks (used to be Network Insight, or vRNI)
  • VCF Operations HCX
  • VCF Operations Orchestrator
  • VCF Automation
  • VCF Identity Broker (provides SSO capability)
  • vSphere Replication
  • VMware ESX
  • VMware NSX
  • VMware vCenter
  • VMware vSAN

Here is a link to where you see all of these components if you try to download from VCF 9.0 from the Broadcom Support Portal. I know I’m linking to 9.0.0.0, the GA release, but let’s see the forest through the trees for this discussion (login required to get to this page!).

This sounds like a lot, and it is, when we (like many) compare it to what we have known for years as vSphere (which is just ESXi and vCenter).

How Do You Get This Deployed?

That might be another post, or better yet, take a 1-day workshop with us here at WEI, and we can show you HOW it gets deployed. About a week (or two, depending on the size of the committee) of planning. About a week of deployment & configuration (to do it right). A few days (to five) to polish up the rest of your new “on-prem private cloud”. So, for the time being, we will just say it gets deployed….

Management Domain

This initial deployment for VCF is what is called the “Management Domain”. The Management Domain runs all those products we listed out above and will then be the location where the management VMs for “Workload Domains” are expected to run…more on that later.

Does it seem like you need a lot of resources to run this full VCF stack in the Management Domain? Well, that depends on what you consider as “a lot of resources”…

  • Total vCPUs allocated: 234 vCPU
  • Total RAM allocated: 825-GB RAM
  • Total Storage allocated: 15.5-TB
  • Total Storage consumed: 4-TB

…and this is with the smallest deployable VM sizing available via VCF-Installer process. Ask us for the RV tools export of a newly deployed VCF environment.

What else might your run in the “Management Domain”? Forgetting that running Windows Server and/or Red Hat VMs requires licensing…

  • Domain Controllers
  • IdP connectors
  • Backup Servers
  • Security workloads

…and other backend functions…but don’t overdo it. This Management Domain will have other things to run.

This Management Domain is running 25 new VMs to start. You see the resources (listed above) those VMs will require. You see all the different components listed earlier that are integrated together…and we want to do it right the first time, because if you can’t do it right to the first time, when will you find the time to fix it later? My advice:

  • Start with 4 x new ESXi servers running vSAN ESA (requires NVMe drives).
  • Brand new, or (very modern) repurposed vSAN ESA Ready Nodes, but they WILL be wiped as part of this process.
  • We will deploy VCF together on those new servers and create the Management Domain.
  • Could you use FC (not FCoE) or NFS? Sure, but given the small cost of a few NVMe drives to run vSAN ESA, we can isolate this “VCF Application” and guarantee the resources required to run our enterprise application, VCF. Plus, it is recommended by the vendor, VMware, to use vSAN for the Management Domain. We will repurpose your external storage when we get to the Workload Domains.

After the Management Domain is configured, we can then import your existing vCenter Servers and the clusters that they manage (and more importantly, the VMs that they run). More on that in a bit.

Taking a step back, we realize that to run VCF in a risk averse implementation, we need a new VMware Cluster of 4 x ESXi hosts running vSAN ESA to get everything deployed.

Sizing the Management Domain

As there are quite a few components deployed for VCF with 3 x VMs in a cluster, and the expectation is to have HA (High Availability) for the VMs running, you need a minimum of 4 hosts. To be redundant myself, that is a 3+1 cluster (the +1 is for the HA event, or more practically, to do maintenance without effecting production workloads).

OK fine, we can agree with 4 nodes configured as a 3+1 cluster. What about the CPU, RAM, storage & network connectivity needed?

CPU: For CPU, let’s focus on the number of vCPUs required. Do you want to oversubscribe the management cluster? You can, but remember, this is what manages your VCF stack, so heavy oversubscription is not the answer.

Should you do a 1:1 VM CPU, for each physical CPU core? I would love to see that happen, but our pocketbooks our not infinite.

OK, so do we go 2:1, or 5:1, or 10:1? For this Management Domain, I’m happy to agree to a 2:1 CPU oversubscription.

  • Let’s work with sizing based on a CPU, with 32-cores per socket.
  • Put 2 x CPUs in each ESXi host (64 cores).
  • Go with the 4-node cluster (technically 3+1 cluster) just discussed.
  • That gives me 256 total cores for the raw total…Technically, that’s 192 cores (3 nodes + 1 for HA) usable.
  • The total vCPUs allocated to the VMs for VCF to get started is 234 vCPUs…
  • We are already at 1.22:1 CPU oversubscription (234 / 192), and we haven’t added any other workloads or VCF functions yet.

RAM: Let’s start with 512-GB per node (I’d really prefer 1-TB per node, but let’s start here, just for the math). That gives you 2-TB of RAM for the raw total. But technically its 1.5-TB of RAM (3 nodes + 1 for HA again). And we are using 0.8-TB just to get started, and we haven’t added any other workloads or VCF functions yet.

What about memory oversubscription? I’m not a fan of that (most of us can agree that swapping RAM is a bad idea), but there is another way to get more useable RAM, and that is with NVMe Memory Tiering (add a NVMe drive to increase your “RAM” installed in the host). Add in NVMe Memory Tiering, and 512-GB per ESXi host isn’t a terrible starting point.

I would recommend 1-TB per host to get started.

vSAN ESA Storage: It’s ~16-TB allocated (thank goodness for thin provisioning in vSAN!) That’s before any growth, and data ingestion, any logs, or any other snapshots or data retention, or even VM templates considered…so let’s add 50% of that to start…24-TB. That’s 24-TB of USEABLE storage, not RAW capacity. 24-TB of RAID-1 is 48-TB RAW.

But vSAN ESA has some great storage efficiency (writes via RAID 1, and depending on the number of ESXi hosts in the cluster….cold data at RAID 5 or 6) and global deduplication is coming soon as well.

So, 48-TB of raw capacity can get you a minimum of 24-TB useable capacity. That means each ESXi host needs to contribute 12-TB of RAW disk capacity. That’s 3 x 4-TB drives.

Yes, you can add more storage to each node in the future (be sure to select hardware ready to do that).
…and don’t forget to add another NVMe drive for Memory Tiering…(typically a different part number than the ones used for vSAN).

Networking (physical NICs): Pretty easy for most of us. We want redundant networking that meets the minimum requirements set forth by our application vendor. 2 x 25-GB NICs.

25-GbE has been around since 2016, and affordable as a ToR (Top of Rack) solution since 2019. Nearly every server today ships with 10/25-GbE NICs onboard. Plus, it is recommended by our VCF “Application” vendor, so we follow their recommendations, given that the absolute minimum is 10-GbE. Latency must also be < 1ms. Link to the Broadcom Documentation is here.

Can you use more than 2 NICs per host? Yes, and you might do that to separate storage or NSX network traffic. We can discuss it, of course, though I hedge my bets for the Management Domain to have a pair of 25-GbE for most folks.

Summary of Management Domain Sizing

You need 4 x ESXi servers ready for vSAN ESA, each configured with:

  • 2 x 32-core CPUs
  • 1-TB RAM
  • 3 x 4-TB NVMe drives (for vSAN ESA)
  • OS boot Drive (Another NVMe, only needs 128-GB minimum)
  • 2 x 25-GbE NICs

Optional, but highly recommended: 1 x 4-TB NVMe for Memory Tiering. This is what is needed to run the VCF “application”, while minimizing risk, delivering an acceptable SLA for performance & recovery, and providing the ability to scale out or up.

But Aren’t There Minimal Deployments?

Yes, there are. I suggest you access the Broadcom Documentation for Basic Management Design. Quoted right from the documentation linked above…

“This Design Blueprint can be used as a full end-to-end design for a VMware Cloud Foundation platform or as a starting point and adjusted to suit your specific objectives by substituting any of the design selections listed below with alternative models.”

This is a great starting point to build a lab or demo environment in getting yourself familiar with VCF capacities and features. However, it is not a recommended way to implement something that is delivering mission critical capabilities for the business.

And you still need about 45% of the resource we discussed earlier when we discussed the Management Domain. You are not deploying everything that you have purchased to help you run a private-cloud.

Let’s say we do this minimum deployment…we are adding risk, with high impact scenarios that can play out in production. Well, what if we add the availability after the fact? I’ll bring up that quote again ,“…if you don’t have time to do it right, when will you have time to fix it?”

This design has the application VMs (VCF Automation, VCF Operations, and NSX) that are typically spread out as 3 x VMs, now running as a single VM each. While they do function, they are not truly available and add many single points of failures to the applications they serve, which essentially adds risk to your VCF created private cloud. Yes, they benefit from vSphere HA (which we have had since 2006 with Virtual Infrastructure 3), but that is not the way these applications were designed to run.

This minimal deployment design uses a cluster that is shared for Management Domain functions as well as any VM workloads that you see fit to mix with the Management Domain. We will call it a Consolidated Domain model (the language used in VCF release prior to 9.0). This will work, yes, but it is not what we expect from any of our applications that drive the business. Minimizing risk is a one of the things I have focused on in my 30+ years of working in IT.

…But the design docs you just linked to say it can be used that way! That is true, but it does not explain that you now need to take outages, additional work, and have limited options when you do updates, patches, or upgrades in the future….all things that are required in the lifecycle of IT any infrastructure component or solution.

Imagine us having this discussion if rolling out SAP, Oracle E-Business Suite, or PeopleSoft. We wouldn’t want to introduce unnecessary risk for those applications, right?

Reuse Existing vSphere Environment

ABSOLUTELY!…Just not for the Management Domain. We still need to run the VMs that are running on our existing vSphere environments, right? That environment isn’t going away anytime soon. We will end up running each of your existing vCenter Servers as a “Workload Domain” (explanation coming soon, I promise).

So long as the server hardware is supported to run ESXi 8.x or 9.x. (vSphere 7 support ended October 2025).

Do I have to use vSAN? No, but you can use vSAN if you would like (or need) to. You can use your existing NFS, FC, FCoE, or iSCSI SANs without issue. If you are using vVols, be aware that in vSphere 9, support is deprecated and vVols will be going away soon, so I would prefer to help you migrate off vVols at this time, rather than later.

What about my vCenter Server(s)? While possible to use vCenter 8, we would recommend upgrading that to vCenter 9. Yes, if vCenter is at version 9, you can still manage ESXi 8.x hosts. We will bring in those existing environments and make them part of your new VCF application.

Then we can take advantage of all the capabilities that VCF brings, most importantly, rightsizing your environment (as licensing CPU cores for no reason can be expensive). That means sizing your VMs as well as your physical servers running ESXi, so that we can optimize your resources so that they better align with the business outcomes defined and needed by your organization.

Workload Domains

While there is only going to be (in nearly all cases) a single Management Domain that is focused on providing VCF functions, management, and capabilities, Workload Domains are very different, but instantly familiar to us.

Essentially, a Workload Domain is very similar to what we are used to, if we think about any of our vSphere environments (any that are version 8 or earlier). It is a vCenter, and an NSX implementation, that runs the VMs that power the applications that our business needs.

Any Workload Domain is going to run the VMs that are currently running. THIS is where we can repurpose existing ESXi hosts and existing storage you have.

That’s it! Workload Domains are very flexible in how we create or import them. We can use storage other than vSAN (though you can still use vSAN here if you’d like).

What’s the difference between deploying a new Workload Domain, or importing an existing vCenter into VCF as a Workload Domain? The process to deploy versus import. That’s it.

So why the separation of duties like this? That’s just how Broadcom created VCF to work, so I just play by the rules provided me. Now, I like the separation of Management from Workload. Matter of fact, I’ve been doing that in my designs since 2009, and many designs of those designs in their 4th or 5th generation now, all well before the Broadcom acquisition and what is now VCF 9.

Since the “Management” of your Workload Domain is vCenter and the 3 x NSX Control VMs…guess where they run? The Management Domain! Yes, even if we import the existing vCenter that is running on your existing cluster, that’s where we should migrate it to.

Are there no other VMs needed to support the Workload Domain? Yes, there are, but they are all already running in the Management Domain.

So, creating (or importing) a Workload Domain requires additional resources in the Management Domain:

  • Total vCPUs allocated: 44 vCPU
  • Total RAM allocated: 174-GB RAM
  • Total Storage allocated: 2-TB
  • Total Storage consumed: 0.5-TB

Sizing the Workload Domain: Well, what about the sizing? There are expectation to have HA (High Availability) for the VMs running…you need a minimum of 3 hosts. To be redundant myself (again,…punny!)….that is a 2+1 cluster (the +1 is for the HA event, or more practically, to do maintenance without effecting production workloads).

What about sizing the CPU, RAM, and Storage (3-tiered or vSAN ESA)? That will vary with each Workload Domain’s Cluster. That’s right, every Workload Domain can have up to 400 x VMware Clusters, each up to 64 ESXi hosts. That’s a lot of resources being managed by just 1 vCenter Server.

Sizing a VMware Cluster

We have all been sizing VMware vSphere Clusters since 2006. The sizing exercise we went through earlier for the Management Domain happens in almost environmen, but quite often I see the following situation play out.

Time to refresh the VMware Infrastructure, so let’s size it to run the current workload and 25% additional growth for the next 3 years. Five years later, we realize we are running 400% of the planned workload, and wondering why performance of our most critical app is suffering. Good thing we will have the tools available to us to help us with that moving forward…

How do you break out each VMware Cluster, or better said, size each VMware Cluster? I would take the same approach I took above for sizing the Management Domain.

What Design Qualities are most important for THAT SPECIFIC workload? Availability, Manageability, Performance, Recoverability, Scalability, or Security? How do we prioritize those Design Qualities for THAT VMware Cluster?

…and we will do that for each of the components that make up your VMware Cluster:
Compute, Storage, Networking, Management, Workloads, Analytics, Chargeback, Reporting, and of course, Compliance.

Just like building a VMware Cluster dedicated to MS SQL or Oracle, you plan your workload requirements, size accordingly, and run it. Extra capacity? Let’s put other VMs on that VMware Cluster for Oracle…NOPE! That was designed a specific way for a specific purpose. That extra capacity is there for a reason, not to be consumed on a whim by something that is not running Oracle.

Questions? Reach out to me on LinkedIn or fill out the Contact Us form here at wei.com.

LinkedInFacebookEmail