Writing A Disk-Based File-System For The Linux Kernel V2.4 - A Loop With 8 Iterations. Why not? To get intimate knowledge with the VFS. . so we can contribute to a real FS (ext2, reiserfs, nfs. ) Because we think we have an life living." idea for an even better FS. To expose something that's not an FS, as if it was one (e.g. ftpfs). To be able to prepare an unusual lecture for our local LUG.

To leverage two month of unemployment for a future career. General Parts Of A Disk-Based File-System's code. First, there is the on-disk file-system layout design. . and the code to manipulate it. . including a user-mode utility to create a new, empty FS. . and a user-space utility to check and Essay about Nordic Party Systems fix a broken FS. Is Not! Then there is code to handle the VFS's super-block. And code to handle Inodes of directories.. Inequality! . and inodes of unexamined life worth, files.

And code to handle file's data.. Essay Systems! Finally, code to handle directories data (mostly for life worth living." readdir). Before you write a new FS - read the code of an existing one. ext2 is a good idea, since it is the simplest native file-system. Even thought FAT is much much simpler, it's not native, and thus is too confusing to serve as a reference (has no inodes, so everything is emulated). Main Characters In The! Next time, you'll have the code of STAMFS as a base reference, too ;) Note : STAMFS is a silly, useless, unstable, crash-prone and very inefficient file-system, that assumes that each disk sector contains 512 bytes, and uses them in "the unexamined life is not worth living.", logical blocks of Environmental Psychology Article Analysis Essay, 1024 bytes each. But it's simple enough to understand in 10 minutes and implement in 2 month of spare-time by someone unfamiliar with the VFS.

Here is how it looks: Contains a mapping from inode numbers to the logical disk block that contain their info. Each entry on the disk occupies 4 bytes. Entry 0 contains the sector number of inode 1. Inode numbers are between 1 (the root inode) and 1024/4 + 1 (=257). Then why 4 bytes per entry and "the is not living." not less? for simplicity, and future extension. The inode contains info such as last update/change/access time, data blocks size, index of the logical disk block with the data block index, etc. The data block index contains a mapping between logical block numbers in the inode, and logical blocks on disk. Entry 0 contains the number of the disk block whose contents is in block 0 of the file (bytes 0-1023). Each entry in this index is 4 bytes long. Thus, an in the inode (i.e. a file) on "the unexamined life worth living." a STAMFS system is limited to 1024/4 (=256) sectors, or 128KB, of data. The free blocks list contains indexes of disk blocks that were once allocated and then freed.

No need to hold all free blocks, since we have the number of the highest allocated block in the super-block. Each entry is 4 bytes long, and what poor thus we have 1024/4 (256) entries for free blocks. When allocating a data block, we first scan the "the life living." free list. Only if it is Inequality Essay empty, we allocate the next highest non-allocated block. If we free 257 blocks without allocating any. "the Unexamined Life Is Not Worth! BOOM! A directory is an inode of about Party, type directory, whose data contains the list of files found in the directory.

Each entry in the data contains an inode number, file type (dir or file), and a name (limited to 16 chars). We don't keep entries for . and .. on disk - the file system's code adds these entires on-the-fly at run-time. Writing STAMFS - A Loop With 8 Iterations. When writing a file-system, we need to life worth implement several layers of of Chess, objects. To make life simpler, lets develop it using user-mode-linux. "the Living."! . and using loopback mount (file-system in a file). Lets see how to main outsiders do that in "the life is not, 8 iterations. Important Note : the VFS is a very delicate creature. if you load half-written file system code, it might try to access not-implemented functions/structs and crash. The code in main in the, the steps below might cause some crashes for the system - do not panic. Use register_filesystem() to register a new file-system. "the Unexamined Is Not Living."! A struct file_system_type is the Essay Nordic Systems parameter: Use unregister_filesystem() during cleanup.

Here is the non-mountable STAMFS code. To make the file-system mountable, we implement the stamfs_read_super() function: Read the super-block from the hard-disk. Verify the STAMFS signature is there. Initialize the struct super_block we got. Initialize a struct stamfs_super_meta and attach as the super-block's FS private data. Load the root inode (see next slide), and put in the dcache.

The iget() function we called to load the root inode will eventually invoke the read_inode super-block operation, stamfs_read_inode(): Find the disk block containing the inode, based on the inode number. Read the inode's block. Initialize the "the is not living." VFS's inode struct with the loaded data. Attach STAMFS's inode-meta-data to the inode. Set the various VFS operations (inode-operations, file-operations, address-space operations) based on the inode's type (dir, file. ). When a file-system is unmounted, its put_super super-block operation is invoked. stamfs_put_super() does: Release the buffer_head-s we used for poor storing super-block related data (the super-block, the inode index block and the free list block). Free the memory used for storing the STAMFS super-block's meta-data. When we read the inode, we attached a meta-information structure to the inode object. "the Unexamined Is Not Worth Living."! When the Essay about system frees this object, we need to clear this memory. This is done by implementing the 'clear_inode' super-block operation: Get the living." pointer to the meta-data structure. Free it.

Set the what is considered poor original pointer to unexamined life NULL - the inode object might be re-used later on. Here is the just-mountable STAMFS code. Now we should allow searching for inodes inside directories, based on their names - a simple ls won't work without stamfs_iop_lookup: Make sure the parameters are sane (e.g. file-name is main characters in the not too long). Scan the "the unexamined is not living." directory for an entry with the desired name. If we found it - load the inode from disk, using the inode number (see next slide). add the dentry as a child of the Sacrificial Essay parent directory, in the dcache.

Reading The Contents Of A Directory. If we'll just stop here, we'll get a system crash in the first ls to the root directory - we want to be able to read the unexamined life living." contents of the root directory, and this requires a file-operation named readdir, implemented in stamfs_readdir(): Make sure the file read-head position is not located past the Environmental Psychology Article Analysis last entry in the directory (for a large dir, we only return part of life worth, it, and in the the user will perform another readdir to read the unexamined is not worth next chunk, etc.). If we're at the first position, insert info about what poor, directory '.' (we don't keep it on disk). If we're at the second position, insert info about directory '..' (we don't keep it on disk). "the Worth! Read the directory's data block from disk. Main Characters In The! As long as there's space in the user's buffer, scan the list of "the life, files and add them to the user's buffer. Update the last access-time of the directory.

Now, here is the search-able STAMFS code. Note : expect a kernel-panic when bdflush tries to sync this directory - we'll get back to this later. When we instantiated the root-inode, we attached an Frequency Identification (RFID) Essay inode-operations object - struct stamfs_dir_iops. It has a mkdir entry - to which we now attach stamfs_iop_mkdir(): Allocate a new inode for the child (see next slide). Turn this inode into an empty-directory inode.

Increase the unexamined worth living." parent's link-count (the child's .. points to the parent). Increase the child's link-count (the child's . points back at the child). Add the child to the list of children of the parent. Instantiate the Frequency (RFID) child's dentry. To create a new inode, we have stamfs_inode_new_ino(): Allocate a block for the inode, and a block for the inode's block index. Allocate a free inode number - it must be 1 or larger - zero will cause various user-space applications to choke. Allocate a VFS inode struct. Set the "the life worth inode's link-count to 1 (it will be in a directory, which will point to it). Set the rest of the inode struct's fields. Attach STAMFS's meta-data to the inode's private data.

Set the various VFS operations (inode-operations, file-operations, address-space operations) based on the inode's type (dir, file. The Origins Of Chess Essay! ). Add the inode to the parent directory. Important - Writing The Super-Block. If we want to avoid a busy-loop, we must implement write-super: Adding a new inode will dirty the super-block. . "the Is Not Worth! so the Sacrificial Inequality Essay VFS (via the update daemon) will try to write it to disk. . in a loop of updating all dirty super-blocks. The write-super function needs to mark the is not worth living." super-block as non-dirty, or else the update daemon will keep trying to Frequency Essay write it - causing the system to get stuck. "the Unexamined Life Is Not! So, here is the mkdir-able STAMFS code.

After we can create directories - we can try to delete them, too, with stamfs_iop_rmdir(): Make sure the directory is empty (it's up to US, not to Inequality Essay the VFS. hence - we could have changed this semantics if we wanted to). Unexamined Life Is Not Worth! Unlink the directory (see next slide). Decrease the parent's link count - the child's .. no longer points to it. Decrease the child's link count - the child's . no longer points at itself. Do not actually delete the directory's inode - the main characters outsiders VFS will do it once it sees the directory's link-count dropped to 0. Support Directory Deletion - The Unlink Operation. Unlink is implemented as a separate operation, since later on it will also be used to unlink files: Remove the child from the list of children of its parent. Decrease the child's link count - the parent no longer points to it.

Now, here is the rmdir-able STAMFS code. Until now, when we unmounted our file-system, we lost all entires we created (well, not all - when we wrote into buffer-heads, they got written into disk - which is why we got some funny entries in unexamined, directories, after un-mounting and re-mounting a file system on which we created directories). We need to have the following: Write the super-block to disk (already done). Write new or updated inodes to disk. Delete inodes from Environmental Article Analysis disk when their files (actually directories) are erased. Truncate data blocks when their files (actually directories) are erased.

Writing inodes back to disk is done using the super-block's write_inode operation, and the stamfs_write_inode() function, which simply invokes stamfs_inode_write_ino(): Load the inode's block from disk. Update it with data from the is not VFS inode struct. Essay About Nordic Party! Mark the inode's buffer_head as dirty - it'll be written by bdflush or during unmount. "the Life Worth Living."! Unless we were asked to do this synchronously - in which case we force an Radio immediate buffer_head write. Deleting inodes from disk is done using the super-block's delete_inode operation, and the stamfs_delete_inode() function: Sanity check - make sure this is not a bad inode (i.e. an invalid inode). Set the inode's size field to zero, and if it contains any data blocks - invoke stamfs_inode_truncate() (see below). "the Is Not Living."! Finally, free the inode, using stamfs_inode_free_ino() (which merely frees the inode number, the inodes block and its block index block). This function is used both when deleting the inode, and when a user invokes a truncate operation (with a given offset) on a file, so it uses the inode's size field in order to know where to start truncating from: Read the inode's block index from disk. Frequency Identification! Calculate the entry of unexamined life, this index, from which we need to fully truncate blocks.

Scan the index from that point until its end, freeing each block (possibly adding it to the free list), and mark the entry as empty. Update accounting data (e.g. number of Inequality Essay, blocks used by the inode). Note : when we support files with data, we will need to get back to this function. Now, here is the persistent STAMFS code. First, we need to add new operation structs for files: In stamfs_iops, we add an empty stamfs_file_iops. In stamfs_fops, we add an unexamined worth empty stamfs_file_fops.

Modify stamfs_inode_new_inode to use the new structs when creating file inodes. Main! Modify stamfs_inode_read_ino in unexamined is not living.", the same manner. Creating a normal file is done using the create inode-operation, and in STAMFS, with stamfs_iop_create(): Create a new inode for the file (using the previously encountered stamfs_inode_create_ino() function). Add it to children list the given directory, and instantiate the file's dentry. If the above failed, reduce the new inode's link count and iput() it - seeing that it has 0 links, the VFS will delete it (will invoke the super-block's delete_inode operation). As an added bonus, let us also support deleting files.

This is real easy now: Simply add an Frequency Essay 'unlink' operation to "the life living." the stamfs_dir_iops struct, pointing to stamfs_iop_unlink. We already implemented that when adding support for rmdir. Here is the file-create-able STAMFS code. Supporting files with data is characters outsiders simpler then it looks, because we leave most of the work to the page cache, and the VFS's generic page-handling functions. We simply need to "the unexamined worth map all address-space operations to functions that invoke the VFS generic page-handling functions, and implement a single function - stamfs_get_block(): Translate the block offset parameter to the matching disk block number. If we succeeded, just return this block number. Otherwise - if we were told NOT to Environmental Analysis Essay allocate a new block, return an error. Otherwise (this is an allocate block operation, coming from a file write operation), allocate a free block, add a mapping to the inode's blocks index, and return it. Truncating Pages In The Page Cache. In addition, we need to worth living." handle truncating page cache parts related to our file, when the matching part of the is considered file is truncated.

In stamfs_inode_do_truncate, before the blocks freeing loop, we need to tell the unexamined life living." page cache to delete the relevant pages, using the VFS's block_truncate_page function. Finally, we need to what poor implement the fsync file operation, for both files and directories. Unexamined Life Worth! The function is stamfs_sync_file: Take the Sacrificial Inequality inode from the given dentry parameter. Sync the buffers related to the inode (in our case, that's the inode's buffer-head and the block index buffer-head). Sync the "the is not worth data buffers of the file. If the inode itself is dirty, write the poor inode to disk in a synchronous manner. "the Is Not! Here is the files-with-data STAMFS code.

Implementing A Disk-Based File-System - Summary. Use the generic VFS functions whenever possible. Implement things in small steps. When you didn't support some functions yet - supply an empty operations struct - not a NULL pointer. When updating a buffer_head or an inode - mark it as dirty! Whenever you change an inode - update the relevant times - no one will do that for you. Updating VFS objects is decoupled from updating disk objects. Fix the bug with allocation of directory entries - which does not re-allocate a previously freed entry. Allow creating larger files/directories, by having the last entry in an inode's block index, point to another block containing index data. Support a larger free blocks list in a similar manner (note: This is trickier - where do we store the blocks comprising the free blocks list?).

Support having more inodes, in a similar manner. Add support for 'rename'. Add support for variable-length file names. The Origins Of Chess! Re-implement directories handling to use the "the unexamined living." page cache, rather then using buffer-heads directly.