vSAN ESA – Log Structured File System

I have already briefly talked about vSAN-ESA’s Log Structured File System(LFS) in my previous blog, it advised to go over it once before you continue reading here.

This blog talks about the overview and architecture of LFS and how erasure coding is benefited when LFS was introduced into vSAN ESA architecture.

How LFS works?

LFS in vSAN-ESA is responsible to handle all the incoming I/Os(Reads and Write) and acknowledge as fast as possible to the guest.

When there is an incoming write IO from the guest , LFS ingests this write on to the “Performance leg” and sends the acknowledgment back to the guest immediately.

If the incoming IO request is a read, then depending on where the block resides the IO acknowledgement is either directly done at the read-cache layer or performance leg, if the IO didn’t exist in the read-cache or performance leg, it is fetched from the “Capacity leg” along with some additional blocks as part of prefetch, which gets cached and send acknowledgement to the guest.

When sufficient amount of data fills into the “Performance Leg”, data will start to be moved to the “Capacity Leg” . To understand what is a “Performance Leg” and a “Capacity Leg”, please continue reading..

How does an Object look like in vSAN_ESA?

In vSAN-ESA we will have a Concatenated Object(forest) at the top level which will have a minimum of two tree underneath the CONCAT Tree which accounts for Performance and Capacity Legs.

The number of capacity legs under the Concat can vary depending on the size of the object, I will cover how component placement works for vSAN-ESA in a different Blog.

Performance Leg?

The performance leg is a very small partition of any given object in vSAN ESA. The performance leg comprises of a Durable Log and a Metadata log section.

The performance leg cannot be more than 255GB in size per component and the number of mirrors(Components) depends on the storage policy assigned to it.

In the above example we see that storage policy assigned is RAID-6 (*see Capacity Section) so performance leg also needs to sustain two hosts failure or two component mirror failure, hence a three way mirrored layout is assigned to this performance leg section.

This performance leg comprises of two section a Durable log section and a metadata section.

Write I/Os coming from the guest VMs is intercepted by LFS, they get Coalesced within the host memory for a very short duration (few Micro Seconds) and is written to this Durable log.

The I/Os written to the durable log consists of both metadata writes and Data writes. Soon after an IO is completed on the Durable log it is acknowledged to the guest for completion. When durable log has enough data to perform one large block full stripe write it moves this data to the Capacity Leg.( A full stripe write avoids additional write amplification for read modified writes, typically 4x amplification for RAID-5 and 6x amplification for RAID-6).

Full stripe writes also reduce amount of data moving across the network because it is reducing all known over heads for partial read modified writes.

Durable log also moves the metadata writes to the Metdata-Log and when metadata section is full in the performance leg it will also be moved to capacity leg eventually.

B+ Tree?

B+ tree is an efficient caching technique, it is used in vSAN-ESA to keep track/record of all the mappings ( lookup tables for logical to physical block address translation) for the data and metadata an efficiently in vSAN-ESA, these cache pages will also gets persisted eventually on to the Capacity Leg section.

Capacity Leg?

Every Object data will get persisted eventually on this Capacity leg, depending on the size of the object we will have one or more Capacity legs below the top level Concat tree. As already explained in the durable log section, capacity leg receives fulls stripe writes from the performance leg and the components are placed across different fault domains depending on the storage policy defined for an Object.

How Erasure Coding benefits from LFS?

As you read thru the above sections you might have already got the idea that the introduction of LFS and performance leg concept eliminates the additional overheads for write penalties for using Either RAID-5 or RAID-6, because all write directly acknowledged at the performance leg to the guests and later moved to the capacity legs.

This means we get the very similar performance when using RAID-1/ RAID-5/ RAID-6 because I/Os are always written to the performance mirrors. We can get the best of both worlds where on one side we can use RAID-EC(Erasure Coding) to get space efficiency and Performance like RAID-1. This was not possible with vSAN-OSA we have to make a decision up front by either choosing RAID-1 with number of failures to tolerate if we need high performance or choose RAID-EC for space efficiency.
Please read thru official VMware Blog on vSAN-ESA here.

Related Posts