Operating Systems
Lecture 11
File System, Secondary Storage, and Protection
Copyright By PowCoder代写 加微信 powcoder
Directory Storage
Protection
Lecture 11
COMP 2432 2021/2022
File Concept
A file is a named collection of related information that is stored on a secondary storage, such as a disk.
A file is the smallest unit of allocation to the secondary storage that can be seen by a user.
A user can only assess data stored on the secondary storage via the file.
Lecture 11
A file system is a collection of files in an organized way, with proper storage and directory structure. COMP 2432 2021/2022
Information in a file is defined by its creator.
Information about files are often maintained
in the directory structure.
File Concept
A disk may be divided into many parts. A part may hold an individual file system.
This is called a partition.
Sometimes, several disks are combined together to hold one large file system.
This collection of disks is also called a partition.
Lecture 11
2432 2021/2022
File Attributes
A file has many properties, described by its attributes. Name: textual file name, the only information that a human
Identifier: unique tag or number that identifies a file within the file system.
Type: needed for systems that support different types of files, e.g. text versus binary file.
Location: pointer to the file location on storage device.
Size: current file size and/or maximum size (in bytes or
User or owner identification: who creates or owns the file.
Protection: decide on who can do reading, writing, executing.
Time and date information: when a file was first created, last modified or last accessed.
Values of the attributes for a file are stored in the directory structure or directory entry. COMP 2432 2021/2022
Lecture 11
can read directly.
Lecture 11
File Operations
A file is an abstract data type (ADT), that provides a basic set of operations.
Create: allocate storage for a file and create an entry to store the file attributes.
Read: return data from the file, normally start reading data from the current read pointer.
Write: store data into the file, often start from the current write pointer.
Reposition within file: move the read and write pointers.
Delete: remove the file entry and release the allocated storage.
Truncate: delete all contents of the file to make it an empty file. COMP 2432 2021/2022
Lecture 11
File Operations
Besides the basic operations of reading data from and writing data to a file, two more common operations are often provided in the implementation of most file systems.
Open: make a file ready for reading/writing.
Actions involve searching for the file entry containing file attributes in the directory structure and moving the content of the entry into memory.
Close: mark the completion of operation on file. Actions involve moving the content of the file entry in
memory back to directory structure on disk.
COMP 2432 2021/2022
File Operations
To manage an opened file, some pieces of data have to be maintained by OS.
File pointer: a pointer to the last read/write location.
There is one pointer for each process that has opened the file.
File-open count: a counter for the number of times that the file is opened.
A file entry in memory (open-file table) can be deleted when the last process using the file closes it. This is also called the reference count (as used in Java garbage collection).
It is similar to number of times that a pipe is closed in Unix/Linux. A process can only read the end-of-file marker when all processes close a pipe on the other end.
Disk location of the file: where the file resides on the disk. A copy of this information is often stored in memory.
Access rights: access mode that the file is opened by a process.
There is one set of rights for each process on each file.
COMP 2432 2021/2022
Lecture 11
File Types
Two major types of files:
Program files Source code
Object code
Executable program
Data files
Character or ASCII file
Binary file
Free-formatted text file
Formatted or structured file
There are different subtypes of files.
They are often indicated by the file extension.
Lecture 11
COMP 2432 2021/2022
File Structure
A file may be structured or unstructured.
Unstructured
The file is just a sequence of words or bytes.
Simple record structure
Each record is stored in a line or a fixed number of lines.
Lecture 11
Unix only supports a simple unstructured file of consecutive bytes.
Application programs must interpret the file content by themselves.
Magic number of a file helps with this step. COMP 2432 2021/2022
Fixed length records.
Variable length records.
Complex structure
Formatted document. Relocatable load file.
Welcome to the Department !!!
Access Methods
There are several methods to access data stored in a file.
Some systems only support some of them.
Sequential access
Data are accessed in order, from beginning to the end. Most systems support sequential access.
Direct access
Also called relative access.
File is composed of fixed-length records.
Data can be accessed directly, based on record number.
Indexed access
A separate index file contains pointers to the data blocks of a data file (direct file or relative file).
Direct access to required data blocks can be achieved via searching the index. COMP 2432 2021/2022
Lecture 11
Access Methods
Sequential access operations:
Read next: return next data item and advance file pointer.
Write next: update next data item and advance file pointer. Reset: put the file pointer back to the beginning.
It is like the rewind operation.
Skip forward/backward: move the file pointer forward
without reading or move it backward.
Backward skipping is only supported in some systems.
Lecture 11
COMP 2432 2021/2022
Access Methods
Direct access operations:
Read n: return the n th data item or block.
Write n: update the n th data item or block.
AsystemthatprovidessequentialaccessoperationsReadnext and Write next can support the two direct access operations above through a new operation Position n.
Position n: move the file pointer to the n th data item or block.
Read n and Write n could be implemented using Position n and
then Read next and Write next.
Sequential access can also be provided easily by a system that only supports direct access.
Lecture 11
2 2021/2022
Access Methods
Indexed access
Use of index allows direct access to data.
Data is stored in a direct or relative file.
Multiple indexes could be maintained to allow direct access to different parts of data based on the index.
Lecture11322021/2022
Directory Structure
A directory is a collection of nodes or entries containing information about all files.
Both directory structure and files reside on the disk.
In Unix/Linux, the directory structure itself is implemented as a file.
Lecture 11
COMP 2432 2021/2022
Directory Structure
Operations on a directory:
Search for a file: find whether a particular file
Create a file: add a new entry for a new file.
Delete a file: remove an entry for a file.
List a directory: get the files in a directory and their file entries.
Rename a file: change the name of the file inside the entry.
Traverse the file system: access every directory and then all files within stored under the file system.
Lecture 11
COMP 2432 2021/2022
exists and get its entry.
Single-Level Directory
A single directory for all users.
All files are stored in a single directory.
Adopted in old Macintosh File System.
Advantage: simple.
Naming problem: two users may try to use the same name for different files.
Grouping problem: how to distinguish files belonging to a user from another versus isolating only files of a particular type (e.g. batch files).
Lecture 11
COMP 2432 2021/2022
Two-Level Directory
Separate directory for each user.
Each user now owns a separate one-level directory.
A Master File Directory (MFD) contains individual User File Directories (UFD).
Each user may create a file with same name as other users.
Faster searching for a file and easier access control.
More difficult for a user to access files of another user when they cooperate: access rights need to be controlled and users need to name another user’s file.
Lecture11021/2022
COMP 2432 2
Tree-Structured Directories
Extend two-level directory into multiple levels like a tree.
Lecture 11
2 2021/2022
Tree-Structured Directories
More efficient searching on even fewer files.
Grouping capability of files belonging to a particular group, e.g. files for a certain subject or with a certain nature.
A current working directory is maintained to define the default searching place for files.
Path names should be given to locate for a file.
Can use absolute or relative path names (from the current
Creating a new subdirectory is done in the current directory.
Deleting a subdirectory could lead to the deletion of all files under it.
Adopted in MS-DOS.
Lecture 11
COMP 2432 2021/2022
working directory).
Acyclic-Graph Directories
Allow for shared subdirectories and files in tree- structured directories so as to form a graph.
Lecture 11 COMP 2432 2021/2022
Lecture 11
Acyclic-Graph Directories
Allow for files or directories to be shared.
File may be referenced via two different names.
It is the same physical file (a single copy) accessible under two different names.
This is called aliasing.
Changing the content of the shared file via one name will cause changes to the physical file and these changes will be reflected via another name.
Try this out in Unix/Linux.
Create file a.c using pico/nano.
ln a.c b.c
This command ln will create a link called b.c to point to a.c so they effectively refer to the same file.
Use ls and you will see both a.c and b.c.
Use cat to check that content of a.c and b.c are same.
Change some parts of b.c using pico/nano.
Now check for content of a.c to verify that it has changed.
COMP 2432 2021/2022
Lecture 11
can be deleted. This number is called the reference count.
COMP 2432 2021/2022
Acyclic-Graph Directories
Aliasing of a file name could create proble
If we count the total number of files, the same file under two
If the file under a name (e.g. dict/w/list) is deleted, the physical file referred by that name (dict/w/list) is deleted, but the path under the other name (spell/words/list) still exists and a user may access it through this other name.
names would be double-counted.
This is the dangling pointer problem.
In C, you may copy a pointer to another and both pointers refer to the
same memory block allocated via malloc().
Deletion of the block via one pointer via free() will cause problem when it is accessed via the other pointer, especially the freed memory is reallocated.
Solution.
Do not delete a file until there is no more access path (or link) to it.
We need to keep all the access paths or links to each file.
Instead of keeping all the links to a file, we can keep a count of the
number of links. When the number of links goes to zero, the file
General Graph Directories
Allow for cycles in directories to exist to form a general graph.
Lecture11021/2022
COMP 2432 2
Lecture 11
General Graph Directories
The general graph with a cycle could cause some problems.
How to count the total number of files?
A poorly designed algorithms will fall into an infinite loop.
If we need a recursive backup of all the files down the directories, how can that be done?
How can we delete a file using reference count?
Without cycle, a deleted file has a reference count of zero.
With cycle, deleted files may have reference count greater than zero.
Garbage collection techniques are needed, but garbage collection algorithms are expensive to run.
We could prevent the problems from happening.
Ensure that there is no cycle in the graph.
Allow only links to file, not to subdirectories.
Every time a new link is added, use a cycle detection algorithm to determine whether there is a cycle created. COMP 2432 2021/2022
File Sharing
File sharing on multi-user systems is desirable.
Sharing may be done through a protection scheme.
User IDs identify users, allowing permissions and protections to be set on a per-user basis.
Group IDs allow users to be in groups, permitting group access rights.
With distributed systems where multiple machines are connected together via a network, files may be shared across the network.
Remote file systems use networking to support accesses to other file systems located under different systems.
Lecture 11
use of J: drive on both PC and Unix/Linux.
Semi-automatically via the web. COMP 2432 2021/2022
Manually via programs like FTP or winscp.
Automatically using distributed file systems, e.g. transparent
Lecture 11
File Sharing
Client-servermodelallowsclientstomountremotefilesystems from servers.
A server can serve multiple clients.
Standard operating system file calls are translated into remote calls
to be executed at the remote machine.
Network File System (NFS) developed by Sun (now Oracle) is a common distributed file-sharing method.
It is the standard Unix client-server file sharing approach.
Storage-Area Network (SAN) provides large storage capacities
to a large user population.
It uses a high-speed network dedicated to the task of transporting
data for storage and retrieval.
Cloud storage is the newest technology to host data in a
collection of nodes over the cloud.
Service is usually provided by third party, usually data center.
Common storage: iCloud, dropbox, Google drive. COMP 2432 2021/2022
File Storage Allocation
Linked allocation Simple.
Directory points to first data block.
A pointer at the end of first data block points to a second data block and so on.
Only efficient for sequential access, but bad for random
Lecture 11
access. COMP 2432 2021/2022
File Storage Allocation
Indexed allocation
Combine all indexes for accessing each file into a common block.
Can store them in a FAT table.
More efficient for random access.
Lecture 11
COMP 2432 2021/2022
Secondary Storage
Recall that a file is a named collection of related information that is stored on a secondary storage, such as a disk.
Lecture 11
COMP 2432 2021/2022
Secondary storage refers to storage outside memory.
Magnetic drum
Magnetic tape
Hard disk
Solid State Drive (SSD)
Networked disk / storage
Secondary storage can also be configured to store data in specialized format, e.g. information in a
Disk Storage
A computer system can access its local storage (local disk / local drive), or storage over the network (remote disk / network drive).
Local storage is easy to use, but it cannot be shared easily. You may install dropbox or Foxy to allow external access, but you may lose control over who can access what.
Remote storage is flexible, but it relies on the availability of network. How do you like a Netbook?
Three major types of disk storage organization. Host-attachedstorage
Network-attached storage
Storage-Area network
Lecture 11
COMP 2432 2021/2022
Lecture 11
Host-Attached Storage
This is the most standard arrangement. Storage is directly attached to computer and is accessed via I/O port.
IDE or ATA hard disk (at most 2 per I/O bus).
SCSI hard disk (at most 16 per I/O bus).
Toimprovefault-tolerance,i.e.tomakeharddisksurvivingdisk failure, one could use mirror hard disk.
Two hard disks writing the same data simultaneously. Primary hard disk provides read/write and secondary (mirror) hard disk serves as data backup.
If primary fails, secondary can provide data for reading.
Simple, but expensive.
Mirror hard disk arrangement can be generalized into RAID (Redundant Array of Inexpensive Disk).
There are 7 levels of RAID: RAID 0 to RAID 6.
Improved parallelism for data access, and fault-tolerance over
single hard disk for RAID 1 to RAID 6. COMP 2432 2021/2022
Host-Attached
RAID 0 means nothing.
No overhead
RAID 1 means mirror.
100% overhead
RAID 5 is the most commonly adopted.
1/n overhead for n data disks.
Lecture 11
COMP 2432 2021/2022
Network-Attached Storage
Network-attached storage (NAS) is a special purpose storage system accessible over the network.
The same network is used for normal networking operation and accessing remote data.
Unix NFS (Network File System) is often NAS-based.
Remote procedure call can be used to access file content.
Lecture 11
also been NFS-based.
In Unix/Linux, compare access to local files under /tmp and remote files under /home/12345678d.
You may not see any difference, but when the network is slow, you will notice a difference.
COMP 2432 2021/2022
NFS makes remote file access transparent to a user. Departmental storage had
Storage-Area Network
Storage Area Network (SAN) is a current technology to provide large storage to a large user population.
Instead of sharing the same network with other services, it has a high-speed private data network dedicated to the task of transporting data for storage and retrieval.
Operate on special storage protocols, not network protocols.
Connect together computers and storage devices to allow
sharing of the pool of storage devices.
Adopted by large institutions, e.g. J: drive attached to SAN.
Lecture 11
COMP 2432 2021/2022
Tertiary Storage
Primary storage refers to memory: directly accessible by CPU.
SecondarystoragereferstoharddiskorSSD:needstoaccess
via I/O bus and DMA to bring into memory for usage.
Tertiary storage refers to CD-ROM and DVD: accessible via I/O bus only if they are placed inside the disk drive.
This forms the hierarchical storage structure, increasing in capacity, increasing in access time, but decreasing in cost.
To complete this hierarchy, cache memory sits between primary memory and CPU.
Virtual memory frame sits between secondary disk and memory.
Temporarily cached file sits between tertiary storage and local disk.
Tertiary storage is used to store gigantic amount of data, e.g. video recorded via surveillance cameras, traffic monitoring images, medical images, SETI signals, data gathered by CERN LHC (Large Hadron Collider).
Lecture 11
COMP 2432 2021/2022
Tertiary Storage
You may think that the storage of a DVD is only 4.7GB to 17GB, much less than that of a hard disk. The fact is you could use thousands of DVDs (or Blu- ray disc) at the same time.
Robotic jukebox allows fast insertion of a selected DVD from a collection of cartridges of DVDs to the disk drive to support tertiary storage.
Lecture 11
COMP 2432 2021/2022
A cartridge can hold 30 to 50 disks, with 20 to 40 cartridges. A dual layer blu-ray can store 50GB, with future technology
storing 200GB.
This is commonly used in many companies that need to
store large amount of information, e.g. banks.
Access time is about 10 seconds from jukebox to load.
You can now see that this really represents tertiary storage in the storage hierarchy.
Lecture 11
Protection and Security
Protection
Mechanisms and policy to keep programs and users from accessing or changing stuff they should not do.
Issues internal to Operating System. Security
Authentication of user, validation of messages, malicious or accidental introduction of flaws, etc.
Issues external to Operating System.
COMP 2432 2021/2022
Protection
OS consists of a collection of objects (hardware or software resources), each has a unique name and can be accessed through a we
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com