Distributed file system

2021-01-18

字数统计: 610字 | 阅读时长≈ 3分

System programming concepts in distributed systems

File concept review

File is a higher level of abstraction. In UNIX like system, we can say that everything is a file.

A File contains:

timestamps: creation, read, write, header.
File type
ownership
Access control list: who man access in what mode.
Reference count: number of directories containing this file. When 0, delete this file.

Like I said, everything is a file. So directory is just a special case of file, too. It is a file containing the meta-data about files in that directory and pointers(on disk) to those files.

We open and close a file with file descriptors with mode (r read, w write, x executable).

Distributed file system

Similar to other concepts, distributed file system should also ensures:

Transparency: clients access file as if it were accessing local files.
Support concurrent clients: multiple clients can read/write the file concurrently.
Replication: fault-tolerance, not lose file when one serve is down. It also has to ensure one-copy semantics, which means when a file is replicate. there is no difference from the file having exactly 1 replica for clients.

It is also important to ensure everyone has the proper right to access files, when involves authentication. Two ways to do this:

Access Control list: list of allowed users and their modes, per file
Capability Lists: list of allowed files and their mode, per user.

Network File System (NFS)

A general structure of NFS is:

Client:

For a client, if a file exist on local system, use UNIX local system, if not, use NFS client system to navigate virtual file system on server, which will search on its own local disk.

It contains data structure of v-node, which is similar to i-node in UNIX. If target file is local, v-node points to local disk i-node. If not, v-node contains address of remote NFS server.

Client Caching: store some of the recently-accessed blocks in memory, have to ensure local cache copy is consistent with server copy by checking modified time and setting up freshness interval.

Example: If client sets up freshness interval = 4s, a client finds that a data block it is storing has two timestamps: last validated timestamp=12345 s, and last modified timestamp=12340 s. At the same time, the server’s last modified timestamp=12346 s. If the client clock currently reads 12350 s, since 12345 + 4 < 12350, so data block is no more fresh and need to be validated. And since 12340 $\neq$ 12346, client fetch the block from the server as the server has latest copy.

Server:

Server caching: store some of the recently-accessed blocks in memory, because most of program written by human have locality of access, beneficial for read.

Delayed write: write in memory and flush to disk periodically, this is fast but inconsistent.

Andrew File System (AFS)

Design principles:

whole file serving (not in blocks)
whole file caching (permanent cache, survives reboots), a type of delayed write.

Motivation (validated assumption) for the principles:

Most file accesses are by a single user
Most files are small
File reads much more often than writes, and typically sequential.
Most of the time, cache is large enough to support

Clients system is known as Venus service, server system is known as Vice.

Since for read and write, AFS is operating on a whole file level, read/write are optimistic. When sending a file from vice to venus, it is accompanied with a callback promise, which promise that if another client modifies then closes the file. In this way, it prevents concurrent modifications/writes.

本文作者： Yu Wan
本文链接： https://cyanh1ll.github.io/2021/01/18/distributed-file-system/
版权声明： CYANH1LL

File concept review

Distributed file system

Navigation of two popular Distributed file systems: NFS & AFS

Network File System (NFS)

Client:

Server:

Andrew File System (AFS)