CS代考 EB851EB851F

File Systems & The Era of Flash- based SSD

Recap: Abstractions in operating systems
Process — the abstraction of a von Neumann machine

Copyright By PowCoder代写 加微信 powcoder

Thread — the abstraction of a processor
Virtual memory — the abstraction of memory
File system — the abstraction of space/location on a storage device, the storage device itself, as well as other peripherals

Recap: How your application reaches H.D.D.
Applications
I/O libraries Buffer
data fread/fwrite — input.bin/output.bin fread/fwrite — input.bin/output.bin
read/write — 0, 512, 4096, … (block address)
read/write — block addresses
Device Driver Device Driver
read/write — block addresses
Device Controller Device Controller Device #2 3 Device #3 Device #4
File system
Device independent I/O interface (e.g. ioctl)
evice Driver
Kernel data
Device Controller

Recap: what BSD FFS proposes? Cylinder groups — improve spread-out data locations
Larger block sizes — improve bandwidth and file sizes Fragments — improve low space utilization due to large blocks
Allocators — address device oblivious
New features
• longfilenames • filelocking
• symboliclinks • renaming

Recap: Performance of FFS
not the case for old FS
writes in FFS are slower than reads
CPU load is fine given that UFS is way too slow!

Recap: Why LFS?
Writes will dominate the traffic between main memory and disks — Unix FFS is designed under the assumption that a majority of traffic is large files
Whoiswrong? UFSispublishedin1984
As system memory grows, frequently read data can be cached
Every modern OS aggressively caches — use “free” in Linux to
Gaps between sequential access and random access Conventional file systems are not RAID aware
efficiently

Recap: What does LFS propose?
Buffering changes in the system main memory and commit those changes sequentially to the disk with fewest amount of write operations

write buffer
Recap: LFS in motion
Data chuck #1
Datachuck #2
Updated Data chuck #1
Data chuck
invalidate
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)

write buffer
LFS v.s. crash
Updated Data chuck #1
You still have a copy of data at some point
crash occurs
Data chuck #1
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)

LFS v.s. write failed
You can try again!
write buffer
Updated Data chuck #1
You still have a copy of data at some point
write failed
Data chuck #1
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)

Segment cleaning/Garbage collection
Reclaim invalidated segments in the log once the latest
updates are checkpointed
Rearrange the data allocation to make continuous segments
Must reserve enough space on the disk
• Otherwise,everywriteswilltriggergarbagecollection • Sinkthewriteperformance

Lessons learned
Performance is closely related to the underlying architecture
• OldUFSperformspoorlyasitignoresthenatureofharddiskdrives
• FFSallocatesdatatominimizethelatenciesofdiskaccesses
FFS optimizes for reads
LFS optimizes for writes — because we have larger memory now
As architectural/hardware changes the workload, so does the
design philosophy of the software

Modern file systems Outline
Flash-based SSDs and eNVy: A non-volatile, main memory
Don’t stack your log on my log
storage system

Modern file system design — Extent File Systems

Contiguous: the file resides in continuous addresses
• Non-contiguous: the file
How do we allocate disk space?
can be anywhere
external fragment as in Segmentation

Conventional Unix inode
File types: directory, file File size
Permission
Attributes
Types of pointers:
• Direct: Access single data block
• Single Indirect: Access n data blocks
number of pointers•
File size is limited by total
Triple indirect: Access n3 data blocks
Double indirect: Access n2 data blocks
inode has 15 pointers: 12 direct, 1 each
single-, double-, and triple-indirect
If data block size is 512B and n = 256: max file size = (12+256+2562+2563)*512 = 8GB

Extents: the file resides in several group of smaller continuous address
can be anywhere
How do we allocate space?
Contiguous: the file resides in continuous addresses
• Non-contiguous: the file

Contiguous blocks only need a pair to represent Improve random seek performance
Save inode sizes
Encourage the file system to use contiguous space allocation
Using extents in inodes

Extent file systems — ext2, ext3, ext4
Basically optimizations over FFS + Extent + Journaling (write-
ahead logs)

Contiguous blocks only need a pair to represent Improve random seek performance
Save inode sizes
Encourage the file system to use contiguous space allocation
Using extents in inodes

How ExtFS use disk blocks
0 Disk blocks 7
8 15 16 23 24 31 32 39 40 47 48 55 56 63
block group
File System Metadata (Superblock)
File Metadata
File Metadata
File System Metadata (Superblock)
File System Metadata (Superblock)
File Metadata

Write-ahead log
Basically, an idea borrowed from LFS to facilitate writes and
crash recovery
Write to log first, apply the change after the log transaction
Update the real data block after the log writes are done
Invalidate the log entry if the data is presented in the target location Replay the log when crash occurs

Flash-based SSDs
eNVy: A non-volatile, main memory storage system
and Rice University

The characteristics of flash memory
Regarding the flash memory technology described in eNVy, how
many of the following is/are true
! Thewritespeedofflashmemorycanbe100xslowerthanreadingflash ” Thegranularitiesofwritinganderasingaredifferent
# Flashmemorycannotbewrittenagainwithoutbeingerased
$ Theflashmemorychiphaslimitednumberoferasecycles
A. 0 B. 1 C. 2 D. 3 E. 4
https://www.pollev.com/hungweitseng close in

The characteristics of flash memory
Regarding the flash memory technology described in eNVy, how
! Thewritespeedofflashmemorycanbe100xslowerthanreadingflash
many of the following is/are true
Writes are slow You can only program/write in the unit of a page (e.g. 4K), but erases must be perform by blocks (e.g. 128 pages)
” Thegranularitiesofwritinganderasingaredifferent
# Flashmemorycannotbewrittenagainwithoutbeingerased
$ Theflashmemorychiphaslimitednumberoferasecycles
A. 0 B. 1 C. 2 D. 3
In-place update needs to erase the block first
You cannot erase too often

Flash memory: eVNy and now
Modern SSDs eN
Read granularity Write/program granularity Write once?
Erase Program-erase cycles
Pages (4K — 16K)
Pages (4K — 16K)
In blocks (64 ~ 384 pages)
Supports byte accesses
Supports byte accesses
3,000 – 10,000 ~ 100,000

Basic flash operations
Page#: 0 1 2 3 4 5 6 7
Programmed page
n-8n-7n-6n-5n-4n-3n-2n-1
…………………
…………………
…………………
Block #n-2
…………………
Block #n-1
…………………
…………
…………
…………

2 voltage levels, 1-bit
Types of Flash Chips
4 voltage levels, 2-bit
8 voltage levels, 3-bit
16 voltage levels, 4-bit
Single-Level Cell (SLC)
Multi-Level Cell (MLC) 31
Triple-Level Cell (TLC)
Quad-Level Cell (QLC)

Programming in MLC
3.1400000000000001243449787580
= 0x40091EB851EB851F
= 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111
11 10 01 00
Multi-Level Cell (MLC)
4 voltage levels, 2-bit
phase #1 phase #2 phase #3
3 Cycles/Phases to finish programming

4 voltage levels, 2-bit
Programming in MLC
3.1400000000000001243449787580
= 0x40091EB851EB851F
= 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111
11 10 01 00
11 10 01 00
Multi-Level Cell 1st page (MLC)
1 Phase to finish programming the first page!

Programming the 2nd page in MLC
3.1400000000000001243449787580
= 0x40091EB851EB851F
= 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111 = 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111
11 10 01 00
4 voltage levels,
phase #2 phase #1
phase #2 01
Multi-Level Cell 00 1st page (MLC)
2 Phase to finish programming the second page!

Flash performance
2,000 4000
Not a good practice
140 105 70 35 0
Reads: less than 150us
Similar relative performance for reads, writes and erases
1,500 1,000 500
3000 2000 1000 0
Program/write: less than 2ms
. Grupp, . Caulfield, , , , . Siegel, and . Wolf.
Characterizing flash memory: anomalies, observations, and applications. In MICRO 2009.
less than 3.6ms
B-SLC2 50nm B-SLC4 72nm E-SLC8 B-MLC8 72nm B-MLC32 50nm C-MLC64 43nm D-MLC32 E-MLC8
B-SLC2 50nm B-SLC4 72nm E-SLC8 B-MLC8 72nm B-MLC32 50nm C-MLC64 43nm D-MLC32 E-MLC8
B-SLC2 50nm B-SLC4 72nm E-SLC8 B-MLC8 72nm B-MLC32 50nm C-MLC64 43nm D-MLC32 E-MLC8
A-SLC2 A-SLC4 A-SLC8
A-SLC2 A-SLC4 A-SLC8
A-SLC2 A-SLC4 A-SLC8
Read Time(!s)
Program Time(!s)
Erase Time(!s)

Program-erase cycles: SLC v.s. MLC v.s. TLC v.s. QLC

Recap: How your application reaches H.D.D.
Applications
I/O libraries Buffer
data fread/fwrite — input.bin/output.bin fread/fwrite — input.bin/output.bin
read/write — 0, 512, 4096, … (block address)
read/write — block addresses
Device Driver Device Driver
read/write — block addresses
Device Controller Device Controller Device#2 37 Device#3 Device#4
File system
Device independent I/O interface (e.g. ioctl)
evice Driver
Kernel data
Device Controller

What happens on a write if we use the same abstractions as H.D.D.
Can we write to page #0 directly? No.
We have to copy page #1, page #2 in block #0 to somewhere (e.g. RAM buffer) and then erase the block
Write this the new 0 and the old 15 back to block #0 again!
Read: 6*30us + Writing: 2ms*3 + Erasing 3ms ~ 9 ms Not much faster than the H.D.D. — also hurts the lifetim
Block #1 Block #2 Block #3 Block #4
Controller
DRAM Buffer
…………………
…………………
…………………
…………………
…………………

The characteristics of flash memory
Regarding the flash memory technology described in eNVy, how
! Thewritespeedofflashmemorycanbe100xslowerthanreadingflash
many of the following is/are true
Writes are slow You can only program/write in the unit of a page (e.g. 4K), but erases must be perform by blocks (e.g. 128 pages)
” Thegranularitiesofwritinganderasingaredifferent
# Flashmemorycannotbewrittenagainwithoutbeingerased
$ Theflashmemorychiphaslimitednumberoferasecycles
A. 0 B. 1 C. 2 D. 3
You cannot erase too often Writes are problematic in flash
In-place update needs to erase the block first

All problems in computer science can be solved by another level of indirection

How your application reaches S.S.D.
Applications
I/O libraries Buffer
data fread/fwrite — input.bin/output.bin fread/fwrite — input.bin/output.bin
read/write — 0, 512, 4096, … (block address)
File system
Device independent I/O interface (e.g. ioctl)
evice Driver
Kernel data
D Device Driver Device Driver
read/write — block addresses
read/write — block addresses
Device#2 41 Device#3 Device#4
FTL Device Controller
FTL: Flash translation layer Device Controller
FTL Device Controller

How should we deal with writes?
How many of following optimizations would help improve the write
performance of flash SSDs?
! Writeasynchronously
” Out-of-placeupdate
# Preallocatelocationsforwritingdata
$ Aggregatewritestothesamebank/chip A. 0
https://www.pollev.com/hungweitseng close in

How should we deal with writes?
How many of following optimizations would help improve the write ! Writeasynchronously—YouneedRAMbuffers
performance of flash SSDs?
” Out-of-placeupdate —Avoidtimeconsumingread-erase-write # Preallocatelocationsforwritingdata
— You need to maintain a free-list and garbage collection when free list is low
$ Aggregatewritestothesamebank/chip
A. 0 B. 1 C. 2
— Probably not. You can write in parallel
Sounds familiar … Log-structured file system!

addresses on the flash memory chip
Flash Translation Layer (FTL)
We are always lazy to modify our applications
• FTLmaintainsanabstractionofLBAs(logicblockaddresses)used
between hard disk drives and software applications
FTL dynamically maps your logical block addresses to physical It needs your SSD to have a processor in it now

What happens on a read with FTL
0x3241 0 0
0x3242 0 63
Read 0x3241
SSD Controller
Flash Block
Flash Page
…………………
…………………
…………………
…………………
…………………

What happens on a write with FTL invalid page Write 0x3241 0x3241
SSD 0x3243 1 3 validpage
0x3244 2 4 free page 0x3245 3 6
0x3246 2 7
LBA Flash Block Flash Page
Controller
0x3242 0 63
…………………
…………………
…………………
…………………
…………………

Garbage Collection in FTL invalid page Write 0x3241 LBA Flash Block Flash Page
SSD 0x3243 1 3 validpage
0x3244 2 4 free page 0x3245 3 6
0x3246 2 7
0x3241 3 7
0x3242 0 63
CPU MBuffer
Controller
…………………
…………………
…………………
…………………
…………………

Flash Translation Layer (FTL)
We are always lazy to modify our applications
• FTLmaintainsanabstractionofLBAs(logicblockaddresses)used
between hard disk drives and software applications
FTL dynamically maps your logical block addresses to physical
addresses on the flash memory chip
FTL performs copy-on-write when there is an update
FTL reclaims invalid data regions and data blocks to allow future
FTL executes wear-leveling to maximize the life time •
It needs your SSD to have a processor in it now

Why eN memories have different characteristics than We want to minimize the modifications in our software
conventional storage and memory technologies

A file system inside flash that performs • Transparentin-placeupdate
• Pageremapping
• Caching/Buffering
• Garbagecollection Exactly like LFS
What eNVy proposed

Performance degrades as your store more data
Modern SSDs provision storage space to address this issue
Write time Program time Erase time
Erase blocks/chip
100ns Read time 100ns Account records 15.5 million
Utilization and performance
100ns Write time 100ns # of index levels 5
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Flash Array Utilization
10,000 TPS 20,000 TPS 30,000 TPS 40,000 TPS
10000 20000
30000 40000
Transaction Request Rate (TPS)
Measured Throughput (TPS)
Measured Throughput (TPS)

Your SSD structured exactly like this!
Stores the mapping table
Controller + Registers
Perform FTL algorithms
The impact of eN SIC (e.g. NAND)

How many of the following file system optimizations that we learned so far would still help improve performance if the underlying device becomes an SSD?
! Cylindergroup ” Largerblocksize # Fragments
File system features revisited
https://www.pollev.com/hungweitseng close in

File system features revisited
How many of the following file system optimizations that we learned so far
would still help improve performance if the underlying device becomes an
! Cylindergroup
no cylinder structure on flash. You probably want random accesses to exploit parallelism ” Largerblocksize maybe…aslongastheblocksizeislargerthanthepagesize
# Fragments remember:flashcanonlywriteunitsofpages
Let’s discuss this with the next paper!
A. 0 B. 1 C. 2 D. 3 E. 4

Don’t stack your log on my log
Jingpei Yang, , , , and
SanDisk Corporation

An append only data structure that records all changes
Advantages of logs
• Betterperformance—alwayssequentialaccesses
• Fasterwrites—youjustneedtoappendwithoutsanitizeexistingdatafirst • Easeofrecovery—youcanfindpreviouschangeswithinthelog
Disadvantage of logs — you will need to explicit perform garbage collections
to reclaim spaces
Data chuck
invalidate
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)

Log is everywhere
• Application:database • Filesystem
• Flash-basedSSDs
Why should we care about this paper?
files, offsets
logic block addresses
File system
Write-ahead Log
They can interfere with each
I/O interface
logic block
Flash translation layer (also log-structured)
An issue with software
engineering nowadays
physical addresses

For example, garbage collection
logic block addresses
I/O interface
logic block addresses
FTL mapping table
File system

Now, SSD wants to reclaim a block
logic block addresses
I/O interface
logic block addresses
FTL mapping table
File system
Flash SSD IJKCDEGH

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com