vSAN 7 Update 1 FastBoot – Feature Walkthrough

vSAN hosts reboots when compared to regular ESXi hosts generally take longer due to its log structured system to capture and track all the modifications occurring at the write buffer / Cache Disks, vSAN uses an in-memory table to track changes to the blocks, LBA tables, bitmaps etc. This in memory table is always lost when a vSAN host is rebooted. At the time of boot vSAN takes a long time to recover these entries from each of the disk groups to reconstruct the in-memory table, the time spent to perform this recovery generally depends on the size of LLOG/PLOG that needs to be recovered from each of the disk groups. vSAN 7.0U1 / vSAN 7 Update 1 performs a “save and restore” ​workflow to rebuild vSAN in-memory metadata allowing “Faster vSAN Host Restarts” or Fastboot.

When a host is placed in maintenance mode, the workflow will be initialized to quiesce/ serialize all the PLOG data and perform a check on the SSD Cache disk to see if enough free space is available to save this data, once enough space is identified it will write the PLOG data to this location on the SSD cache disk.

When host is rebooted, log recovery process will de-serialize PLOG data log from the same location and loads it into the in-memory table a lot faster by avoiding manual log recovery procedure which is generally seen in the previous releases. Hence we see ~5x faster reboot times as it does not spend a lot of time to rebuild the PLOG in memory table at time time of boot. This fastboot feature does not work if the host had crashed with a PSOD, disk failure or if new drives were added to the DG when the host was placed in maintenance mode.

vSAN 7.0 U1- Fastboot workflow

Successful FASTBOOT Log Entries

zcat /var/log/boot.gz | egrep -i "Fast | fastboot" | less

2020-09-20T07:34:13.255Z cpu1:263673)LSOMCommon: LSOMFbOpen:355: disk: 5269015b-c76b-638a-1a14-906e569ef428 LSOM Fastboot Context 0x45006a7b1a40, start offset: 1048576, end offset: 214695936
2020-09-20T07:34:13.255Z cpu1:263673)LSOMCommon: LSOMFbInitCtx:161: disk: 5269015b-c76b-638a-1a14-906e569ef428 Initialized fastboot de-serialization context 0x45006a7aa960, cprGenNr 179
2020-09-20T07:34:13.298Z cpu3:263673)PLOG: PLOGFbFastRecoverDevice:1384: disk: 5269015b-c76b-638a-1a14-906e569ef428 Fastboot completed successfully
2020-09-20T07:34:13.496Z cpu3:263675)LSOMCommon: LSOMFbOpen:355: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 LSOM Fastboot Context 0x4500543b6ae0, start offset: 1048576, end offset: 214695936
2020-09-20T07:34:13.496Z cpu3:263675)LSOMCommon: LSOMFbInitCtx:161: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Initialized fastboot de-serialization context 0x4500543c6d00, cprGenNr 2066
2020-09-20T07:34:13.562Z cpu3:263675)PLOG: PLOGFbFastRecoverDevice:1384: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Fastboot completed successfully
2020-09-20T07:34:13.562Z cpu3:263675)LSOMCommon: LSOMFbFreeCtx:229: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Freeing fastboot context: 0x4500543c6d00
2020-09-20T07:34:13.562Z cpu3:263675)LSOMCommon: LSOMFbClose:407: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 LSOM Fastboot Context 0x4500543b6ae0 close
2020-09-20T07:34:13.562Z cpu3:263675)PLOG: PLOGRecoverDeviceLogsDispatch:5198: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Completed fast recovery successfully
2020-09-20T07:34:13.562Z cpu3:263675)LSOMCommon: LSOMFbOpen:355: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 LSOM Fastboot Context 0x4500543b6ae0, start offset: 1048576, end offset: 214695936
2020-09-20T07:34:13.564Z cpu3:263675)LSOMCommon: LSOMFbClose:407: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 LSOM Fastboot Context 0x4500543b6ae0 close

Fastboot failure after a PSOD

zcat /var/log/ | egrep -i "Fast | fastboot" | less

2020-09-20T07:07:56.751Z cpu3:263673)LSOMCommon: LSOMFbOpen:355: disk: 5269015b-c76b-638a-1a14-906e569ef428 LSOM Fastboot Context 0x45006a7a3880, start offset: 1048576, end offset: 214695936
2020-09-20T07:07:56.752Z cpu2:263673)LSOMCommon: LSOMFbInitCtx:125: disk: 5269015b-c76b-638a-1a14-906e569ef428 Invalid fastboot header for super block
2020-09-20T07:07:56.752Z cpu2:263673)LSOMCommon: LSOMFbClose:407: disk: 5269015b-c76b-638a-1a14-906e569ef428 LSOM Fastboot Context 0x45006a7a3880 close
2020-09-20T07:07:56.752Z cpu2:263673)PLOG: PLOGFbFastRecoverDevice:1326: disk: 5269015b-c76b-638a-1a14-906e569ef428 Failed to init the fastboot context
2020-09-20T07:07:56.752Z cpu2:263673)PLOG: PLOGRecoverDeviceLogsDispatch:5204: disk: 5269015b-c76b-638a-1a14-906e569ef428 Fast recovery failed : Failure
2020-09-20T07:07:56.752Z cpu2:263673)LSOMCommon: LSOMFbOpen:355: disk: 5269015b-c76b-638a-1a14-906e569ef428 LSOM Fastboot Context 0x45006a7a3880, start offset: 1048576, end offset: 214695936
2020-09-20T07:07:56.835Z cpu3:263675)PLOG: PLOGFbFastRecoverDevice:1326: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Failed to init the fastboot context
2020-09-20T07:07:56.835Z cpu3:263675)PLOG: PLOGRecoverDeviceLogsDispatch:5204: disk: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Fast recovery failed : Failure
When FASTBOOT falls back to  manual log recovery :

2020-09-20T07:07:57.212Z cpu0:263202)LSOMCommon: SSDLOG_SetBlkMap:283: device: 5269015b-c76b-638a-1a14-906e569ef428 BlkMap state changed from SSDLOG_BLKMAP_INITIALIZED to SSDLOG_BLKMAP_TRACKING, segNo:0, type:1
2020-09-20T07:07:57.268Z cpu3:263201)PLOG: PLOGAllocDevState:3771: 528cade0-9b4c-a8cc-ee5c-3d2bbec524f1: PLOG state num WB buckets: 4, bucket size 2629886976
2020-09-20T07:07:57.269Z cpu3:263201)PLOG: PLOGHandleLogEntry:375: Recovering SSD state for MD 52d0a4e6-4b5d-1d6c-7b8a-7ccf2b62f85d
2020-09-20T07:07:57.269Z cpu3:263201)PLOG: PLOGAllocDevState:3771: 52d0a4e6-4b5d-1d6c-7b8a-7ccf2b62f85d: PLOG state num WB buckets: 4, bucket size 2629886976
2020-09-20T07:07:57.775Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 1500 of ~73785 (2%) log blocks. 0s so far, ~24s left.
.
.
2020-09-20T07:07:59.575Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 4500/~73785, 6%, 2s/~36s.
2020-09-20T07:07:59.627Z cpu1:263202)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5269015b-c76b-638a-1a14-906e569ef428 Recovery progress: 4500 of ~7229 (62%) log blocks. 2s so far, ~2s left.
2020-09-20T07:07:59.627Z cpu1:263202)vSAN Recovery: 5269015b-c76b-638a-1a14-906e569ef428 4500/~7229, 62%, 2s/~2s.
2020-09-20T07:08:00.629Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 6000 of ~73785 (8%) log blocks. 3s so far, ~40s left.
2020-09-20T07:08:00.629Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 6000/~73785, 8%, 3s/~40s.
2020-09-20T07:08:00.691Z cpu3:263202)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5269015b-c76b-638a-1a14-906e569ef428 Recovery progress: 6000 of ~7229 (82%) log blocks. 3s so far, ~3s left.
2020-09-20T07:08:01.540Z cpu2:263673)PLOG: PLOGRecDisp:999: PLOG recovery complete 5269015b-c76b-638a-1a14-906e569ef428:Processed 110218 entries (unmap 0), Took 4348 ms
2020-09-20T07:08:01.542Z cpu2:263673)Destroyed VSAN Slab PLOGRecovSlab_0x450040015860 (maxCount=0 failCount=0)
2020-09-20T07:08:01.542Z cpu2:263673)PLOG: PLOGRecDisp:1010: Recovery for 5269015b-c76b-638a-1a14-906e569ef428 completed with Success
.
.
2020-09-20T07:08:34.733Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 67500 of ~73785 (91%) log blocks. 37s so far, ~37s left.
2020-09-20T07:08:34.733Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 67500/~73785, 91%, 37s/~37s.
2020-09-20T07:08:35.504Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 69000 of ~73785 (93%) log blocks. 38s so far, ~38s left.
2020-09-20T07:08:35.504Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 69000/~73785, 93%, 38s/~38s.
2020-09-20T07:08:36.302Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 70500 of ~73785 (95%) log blocks. 39s so far, ~39s left.
2020-09-20T07:08:36.302Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 70500/~73785, 95%, 39s/~39s.
2020-09-20T07:08:37.085Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 72000 of ~73785 (97%) log blocks. 39s so far, ~39s left.
2020-09-20T07:08:37.085Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 72000/~73785, 97%, 39s/~39s.
2020-09-20T07:08:37.892Z cpu2:263201)LSOMCommon: SSDLOGLogEnumProgress:2200: device: 5278ef79-0866-7f0a-42a8-bb7a2477f843 Recovery progress: 73500 of ~73785 (99%) log blocks. 40s so far, ~0s left.
2020-09-20T07:08:37.892Z cpu2:263201)vSAN Recovery: 5278ef79-0866-7f0a-42a8-bb7a2477f843 73500/~73785, 99%, 40s/~0s.
2020-09-20T07:08:38.096Z cpu2:263201)LSOMCommon: SSDLOGEnumLog:2385: PLOG: Total Time: 40828818 us, Read Time: 40422590 us, Process Time: 4455184 us, numReads: 73170

2020-09-20T07:08:38.142Z cpu3:263675)PLOG: PLOGRecDisp:999: PLOG recovery complete 5278ef79-0866-7f0a-42a8-bb7a2477f843:Processed 881024 entries (unmap 21), Took 40874 ms
2020-09-20T07:08:38.144Z cpu3:263675)Destroyed VSAN Slab PLOGRecovSlab_0x45004001a880 (maxCount=0 failCount=0)
2020-09-20T07:08:38.144Z cpu3:263675)PLOG: PLOGRecDisp:1010: Recovery for 5278ef79-0866-7f0a-42a8-bb7a2477f843 completed with Success

Read about other features with vSAN 7.0 U1 / vSAN 7 Update1 here or on vmware-official blogs here

Related Posts