File system

Files System In os
operating system
How to store the large amount of data into the computer?
What happens, when process terminates or killed using some data?
How to assign the same data to the multiple processes?
The solution to all these problems is to store information on disks or on other external media called files.

What is file?
A file is named collection of related information normally resides on a secondary storage device such as disk or tape. Commonly, files represent programs (both source and object forms) and data; data files may be numeric, alphanumeric, or binary. Information stored in files must be persistent,- not be effected by power failures and system reboot. The files are managed by OS. The part of OS that is responsible to manage files is known as the file system.

Files System Issues
How to create files?
How they named?
How they are structured?
What operations are allowed on files?
How to protect them?
How they accessed or used?
How to implement?

File Naming
When a process creates a file, it gives the file name; while process terminates, the file continue to exist and can be accessed by other processes.
A file is named, for the convenience of its human users, and it is referred to by its name. A name is string of characters. The string may be of digits or special characters (eg. 2, !,% etc). Some system differentiate between the upper and lower case character, whereas other system consider the equivalent (likeUnix and MS-DOS). Normally the string of max 8 characters are legal file name (e.g., in DOS), but many recent system support as long as 255 characters (eg. Windows 2000).
Many OS support two-part file names; separated by period; the part following the period is called the file extension and usually indicates something about the file (e.g., file.c . C source file). But in some system it may have two or more extension such as in Unix proc.c.Z - C source file compressed using Ziv-Lampel algorithm. In some system (e.g., Unix), file extension are just conventions; in other system it requires (e.g., C compiler must requires .c source file).

File Structure
Files must have structure that is understood by OS.
Files can be structured in several ways. The most common structures are:
Unstructured
Record Structured

Tree Structured

Unstructured:
·         Consist of unstructured sequence of bytes or words.
·         OS does not know or care what is in the file.
·         Any meaning must be imposed by user level programs.
·         Provides maximum flexibility; user can put anything they want and name they anyway that is convenient.
·         Both Unix and Windows use these approaches.

Record Structured:
·         A file is a sequence of fixed-length records, each with some internal structure.
·         Each read operation returns one records, and write operation overwrites or append one record.
·         Many old mainframe systems use this structure.

Tree Structured:
·         File consists of tree of records, not necessarily all the same length.
·         Each containing a key field in a fixed position in the record, sorted on the key to allow the rapid searching.
·         The operation is to get the record with the specific key.
·         Used in large mainframe for commercial data processing.

File Types
Many OS supports several types of files
Regular files: contains user information, are generally ASCII or binary.
Directories: system files for maintaining the structure of file system.
Character Special files: related to I/O and used to model serial I/O devices such as terminals, printers, and networks.
Block special files: used to model disks

ASCII files:
·         Consists of line of text.
·         Each line is terminated either by carriage return character or by line feed character or both.
·         They can be displayed and printed as is and can be edited with ordinary text editor.
Binary files:
·         Consists of sequence of byte only.
·         They have some internal structure known to programs that use them (e.g., executable or archive files).
·         Many OS use extension to identify the file types; but UNIX like OS use a magic number to identify file types.



Access Methods
Files store information. When it is used, this information must be accessed and read into computer memory. The information in the file can be accessed in several ways. Some systems provide only one access method for files. Other system, such as those of IBM, support many access methods, and choosing the right one for a particular application is a major design problem. The access methods are Sequential and Direct access
Sequential Access:
The simplest access method; Information in the file is processed in order, one record after the other starting at the beginning, but could not skip around and read them out of order. The files could be rewound. It is convenient when the storage medium is magnetic tap. It is used in many early systems.

Direct Access:
Files whose bytes or records can be read in any order. It is based on disk model of file, since disks allow random access to any block. It is used for immediate access to large amounts of information. When a query concerning a particular subject arrives, we compute which block contain the answer, and then read that block directly to provide desired information.
Operations: read n, write n (n is block number) or seek to set current position. File can be accessed sequentially from the current position.

File Attributes
In addition to name and data, all other information about file is termed as file attributes.
The file attributes may vary from system to system. Some common attributes are listed here.
File Operations
OS provides system calls to perform operations on files. Some common calls are:
Create: If disk space is available it create new file without data.
Delete: Deletes files to free up disk space.
Open: Before using a file, a process must open it.
Close: When all access are finished, the file should be closed to free up the internal table space.
Read: Reads data from file.
Append: Adds data at the end of the file.
Seek: Repositions the file pointer to a specific place in the file.
Get attributes: Returns file attributes for processing.
Set attributes: To set the user settable attributes when file changed.
Rename: Rename file.

Directory Structure
A directory is a node containing information about files. It is also called folder.

Directories can have different structures.
1 Single-Level-Directory:
·         The simplest form of directory system is having one directory containing all the files.
·         It is also called root directory, but since it is the only one, the name does not matter much
·         All files are contained in the same directory.
·         Easy to support and understand; but difficult to manage large amount of files and to manage different users. An example of a system with one directory is given below.

·         The problem with having only one directory in a system with multiple users is that different users may accidentally use the same names for their files.
·         For examples, if A creates a file name called Ram, and then later user B also creates a file name called Ram, B’s file will overwrite A’s file. Consequently, this scheme is not used on multiuser system.
2 Two-Level-Directory:
·         To avoid conflicts caused by different users choosing the same file name for their own files, the next step up is giving each user a private directory.
·         Separate directory for each user.
·         Used on a multiuser computer and on a simple network computers.
·         It has problem when users want to cooperate on some task and to access one another's files. It also cause problem when a single user has large number of files.
The example of this system is shown below
3 Hierarchical-Directory:
·         Generalization of two-level-structure to a tree of arbitrary height.
·         This allow the user to create their own subdirectories and to organize their files accordingly.
·         To allow to share the directory for different user acyclic-graph-structure is used.
·         Nearly all modern file systems are organized in this manner.
·         This approach is shown below.
·         Here, the directory A, B, C contained in the root directory each belong to different user, two of whom have created subdirectories for projects they are working on.
Path Names
When the file system is organized as a directory tree, some way is needed for specifing file names. Two different methods are commanly used. These are absolute and relative path names.
1 Absolute Path Name:
Path name starting from root directory to the file. e.g. In Unix: /usr/user1/bin/lab2.
Path separated by / in Unix and \ in windows.
2 Relative Path Name:
Concept of working directory (also called working directory).
A user can designate one directory as the current working directory, in which case all path names not beginning at the root directory are taken relative to the working directory.
E.g., bin/lab2 is enough to locate same file if current working directory is /usr/user1.

Directory Operation
The following are the different directory operations which are listed below.
1.      Create: A directory is created. It is empty except for dot and dotdot, which are put there automatically by the system
1.      Delete: A directory is deleted. Only an empty directory can be deleted.
2.      Opendir: Directory can be read. For example, to list all files in a directory, a listing program opens the directory to read out the names of all the files it contains.
3.      Closedir: When a directory has been read, it should be closed to free up internal table space.
4.      Readdir: This call returns the next entry in an open directory. Formally, it was possible to read directories using the usable read system call, but that apporach has the disadvantage for forcing the programmer to know and deal with the internal structer of the directories.
5.      Rename: In many respects, directories are just like files and can be renamed the same way file can be.
6.      Link: Linking is the technique that allows a file to appear in more than one directory.
7.      Ulink: A directory entry is removed. If the file being unlinked is only present in one directory, it is removed from the file system.

File-System Implementation
How files and directories are stored?
How disk space is managed?
How to make everything work efficiently and reliably?

Allocation Methods
The most popular method for implementing the file systems are
1 Contiguous Allocation
Each file occupies a set of contiguous block on the disk.
Disk addresses define a linear ordering on the disk.
File is defined by the disk address and length in block units.
With 2-KB blocks, a 50-KB file would be allocated 25 consecutive blocks.
Both sequential and direct access can be supported by contiguous allocation

Advantages:

Simple to implement; accessing a file that has been allocated contiguously is easy.
High performance; the entire file can be read in single operation i.e. decrease the seek time.
Problems:
·         fragmentation: when files are allocated and deleted the free disk space is broken into holes.
·         Dynamic-storage-allocation problem: searching of right holes.
·         Required pre-information of file size.
Due to its good performance it used in many system; it is widely used in CD-ROM file system.

2 Linked Allocation
Each file is a linked list of disk blocks; the disk block may be scattered anywhere on the disk.
Each block contains the pointer to the next block of same file.
To create the new file, we simply create a new entry in the directory; with linked allocation, each directory entry has a pointer to the first disk block of the file.
Problems:
·         It solves all problems of contiguous allocation but it can used only for sequential access file; random access is extremely slow.
·         Each block access required disk seek.
·         It also required space for pointer.
Solution:
 Using File Allocation Table (FAT).
The table has one entry for each disk block containing next block number for the file. This resides at the beginning of each disk partition.

3 Linked Allocation using FAT

The FAT is used as is a linked list. The directory entry contains the block number of the first block of the file. The FAT is looked to find next block until a special end-of-file value is reached.
Advantages:
The entire block is available for data.
Result the significant number of disk seek; random access time is improved.
Problems:
The entire table must be in memory all the time to make it work.
With 20GB disk and 1KB block size, the table needs 20 millions entries. Suppose each entry requires 3 bytes. Thus the table will take up 60MB of main memory all the time.

4 Index Allocation
To keep the track of which blocks belongs to which file, each file has data structure (i-node) that list the attributes and disk address of the disk block associate with the file.
Each i-node are stored in a disk block, if a disk block is not sufficient to hold i-node it can be multileveled.
Independent to disk size.
If i-node occupies n bytes for each file and k files are opened, the total memory by i-nodes is kn bytes.


No comments

Powered by Blogger.