I have come to the conclusion that SAN Booting VMWare ESX is just a bad idea in production. By production, I mean in an environment where you have at least two redundant paths to your storage, and the machines are intended for 24/7 up time.
I admit I am talking more about iSCSI than FC because I haven’t investigated FC as much. What do you think? I normally would post a bunch of links to support my theory and I will be happy to do so if there is interest (In other words, I can’t find them right now but I wanted to post this).
Why would you not want to SAN Boot ESX? Isn’t that the rage? Yes, yes it is.
If you SAN booted it, wouldn’t you be able to replicate the boot LUNs to another site and have all in one DR? Wouldn’t you get deduplication of the ESX LUN, snapshots, and all the other great things that make SANs the end all be all of storage? Yes, yes you would.
So, it you are gaining all these great advantages, why not do it? One simple reason, local fail over. From what I have seen and heard, ESX just really doesn’t have the multi-path code in place today to handle having the connection to the boot LUN ripped out from under it. I would think the FC is more robust but I haven’t tested it. If you are doing this and have tested it, please let me know!
2 Comments »
In my day to day activities I depend a lot on the hardware vendor forums for the products I support. I wanted to take a second to share all the links I have acquired. Some will be obvious to everyone, but a few of them are pretty obscure. Enjoy!
2 Comments »
I’ve been following all the press regarding IBM’s new Multi-Node 3950 M2. The machine looks great and the ability to scale from one 4 socket box to multiple boxes (2 at this time) is simply awesome. You simply can’t match that kind of raw horse power.
But then I got to thinking… At what point is it too much? Let’s just talk about ESX for a second, I know the box has other applications but I want to focus on ESX. A single 3950 M2 offers a maximum of 16 cores (4 socket x 4 cores) plus up to 128GB of memory (256 GB when the 8GB DIMMS are released). Double that in a two node configuration and you get 32 cores and 512 GB.
ESX 3.5 currently supports 32 cores (64 core is experimental) and 256GB max. I could see an extreme situation where 32 cores with 256 GB (using the cheaper 4GB DIMMs) might be feasible if your workload is CPU bound. Yes, I know if you compare the underlying chipsets the IBM X4 chipset screams compared to anything else. It will blow the doors off most blades (no matter who makes them) for pumping raw data through the pipes. The problem at the end of the day is money. It is getting increasingly more difficult to justify the high end servers on a price vs performance comparison. The 3950 M2 performs better, but at what cost? Also, with VMWare HA and DRS features, the scale “out” (using more, smaller boxes) has become more appealing than the scale “up” (using less, larger boxes) for distributing workloads across machines while maintaining overhead for a machine failure.
Increasingly in Information Technology, it is becoming a “Wal-mart” world. Often times, good enough will do. What do you think?
(Thanks to Scott Lowe and Matt Portnoy for keeping me honest on the max values for ESX!)
10 Comments »
This article over at the Server Virtualization Blog got me thinking… Are Blades the next “Pizza Box” servers for ESX? By that I mean are Blades reaching the mainstream to the point that they are becoming a commodity? In my role as a pre-sales Engineer, I speak to many customers and you can predict what many of them want to talk about before we walk in the door, Blades and ESX.
Yes, we still move pizza boxes and many customers (usually the price sensitive ones) still love them. Now that the blade market has matured, we are seeing less and less of the pizza box attitude.
Let’s take that a step further, what are they buying? Some decide to go with the smaller form factor blades (IBM HS21 and HP BL460c) but a surprising number are going for the larger HP BL680c. The BL680c Blade is a four socket Intel Blade with a maximum of 128 GB of memory and plenty of expansion ports, especially using the quad port Ethernet expansion (remember to use the right model quad port card for ESX!!) For me, the small blade vs large blade decision always ends in the “it depends” answer.
You’ll notice I didn’t throw out an IBM 4 socket model. Not to throw IBM to far under the bus on this one but LS41 product just isn’t appealing to most customers right now. That is a discussion for another day.
What are you seeing? What are your thoughts?
2 Comments »
vinternals has a great article on the current subtle differences between ESX 3i and full blown ESX. By “subtle”, I mean they will bite you in the butt as an Engineer if you believe the marketing people. This is the main reason I started the site. I want to get information out there to the technical masses that often leads to trouble if you haven’t run into it before (or somebody does and tells you!).
The Infiniband isn’t that big of a deal to me but the HA and Networking caveats are! I hope it helps!
No Comments »
This is an update to my previous articles here and here on this issue.
I now have confirmation from HP that the supported OS’s for iSCSI Boot on the Blades are RHEL (versions 4 & 5) and SUSE (versions 9 & 10). There is no Microsoft or VMWare support at this time.
5 Comments »
This was just posted today and it is a very interesting read. Xtravirt.com has released a commentary regarding VMWare Workstation’s inability to run ESX 3.5 in a VM.
I am hoping that the next version of Workstation will address this issue (that’s the rumor according to the VMWare boards) because I know the company won’t be upgrading the old laptop any time soon!
No Comments »
I was fortunate enough this week to attend the Symantec Storage Conference in San Jose, CA. The presentations were great and the knowledge presented was first rate. I attended two presentations/demonstrations in particular that I wanted to share impressions:
Symantec/Veritas Clustering Services (VCS) for VMWare
- This product is an agent that is loaded into the Service Console on ESX and also optionally as an agent in the ESX guest machines.
- VCS takes the place of VMWare High Availability (it must be disabled if VCS is in place) but is complementary to VMWare DRS.
- The product protects you against many more types of failures, including guest OS failures, than VMWare HA and looked really solid.
- You can extend the cluster remotely for disaster recovery purposes on supported storage platforms.
- I spoke to many partners that are already selling/installing this product and it had a good reputation at the conference.
There are two big things that keep me from recommending the product right now. ESX 3.5 isn’t supported and NetApp Storage isn’t supported until Update 2 of the product later this year. The product is expected in 3Q but until that time I really don’t have a lot of use for the product in my accounts. But, if you are using a supported storage platform, your farm is ESX 3.0X, and you are having pains with VMWare HA, it is worth a look!
Symantec NetBackup for VMWare
I can see why this product won best in show at VMWorld this year. NetBackup 6.5.1 looked incredible on ESX.
- No configuration of VCB needed. Just load VCB on the proxy server and NetBackup will hook into it. Anybody who has configured VCB by hand editing text files knows this is great .
- More than one “instance” of VCB open at a time. This allows for multiple streams and faster backups/restores .
- The ability to perform both VMDK and file level backups in the same pass. The competitors all support this as well, but with two passes and twice the storage, one for the vmdk, one for the files.
- Restores are a piece of cake because the NetBackup database is proxy aware and knows which VM the files came from
- No software loaded on ESX hosts or agents
- Traffic flows over the SAN, not the LAN
- Very minimal overhead to ESX server to perform backup (take a snapshot and monitor the re-do log)
- Many other features that I can’t think of right now…. As you can see I was impressed by the product!
No Comments »
This afternoon I attempted to load VMWare ESX 3.5 in a VMWare Workstation 6.02 Virtual Machine. I downloaded the latest version of the white paper from xtravirt.com and followed the directions. The ESX VM boots fine and everything looks great. I then created a VM inside the ESX server and tried to power it on, and….. BOOM! Hard crash on the ESX VM.
I asked one of my co-workers, Tim, who has this running with ESX 3.02 in VMs. After some digging it turns out that 3.5 in a VM doesn’t seem to work right now. A number of people are having the same issue. Here and Here are links to some threads about it.
For now, I’m going to punt and go back to 3.02 to set up my VMotion environment and play with it another time. To Be Continued…
Thanks to Tim for your help!
UPDATE: It looks like in the few hours since I checked this out VMWare has stated that this will be fixed in Workstation 6.5!
No Comments »
It appears we have found a possible bug in the Deploy from Template Command in ESX 3.5. When you create a Windows Server based template and then try to deploy directly into an Active Directory with customization, the new system will get an error that a service failed to start when the machine is launched. This is because the VMWare BootRun service is not removing itself properly after deployment. This does not happen with deployments into a workgroup.
If you aren’t familiar with the BootRun service, this service will make all of the customizations after the sysprep work is complete during the deployment. You usually never see it because it runs on the first boot, makes the changes, and then removes itself from the machine.
In this case, the files are removed but the service entry is still there, hence the error that it can’t start up. VMWare has confirmed this to be a problem and they are investigating.
3 Comments »