Global File System

From Wikipedia, the free encyclopedia

Jump to: navigation, search
GFS
Developer Red Hat (originally Sistina Software)
Full name Global File System
Introduced 1996 (IRIX (1996), Linux (1997))
Structures
Directory contents Hashed (small directories stuffed into inode)
File allocation bitmap (resource groups)
Bad blocks No
Limits
Max number of files Variable
Max filename length 255 bytes
Allowed characters in filenames All except NULL
Features
Dates recorded attribute modification (ctime), modification (mtime), access (atime)
Date resolution 1s
Attributes No-atime, journaled data (regular files only), inherit journaled data (directories only), synchronous-write, append-only, immutable, exhash (dirs only, read only)
File system permissions Unix permissions
Transparent compression No
Transparent encryption No
Single Instance Storage across nodes only
Supported operating systems IRIX (now obsolete), FreeBSD (now obsolete), Linux
GFS2
Developer Red Hat
Full name Global File System 2
Introduced 2005 (Linux 2.6.19)
Structures
Directory contents Hashed (small directories stuffed into inode)
File allocation bitmap (resource groups)
Bad blocks No
Limits
Max number of files Variable
Max filename length 255 bytes
Allowed characters in filenames All except NULL
Features
Dates recorded attribute modification (ctime), modification (mtime), access (atime)
Date resolution Nanosecond
Attributes No-atime, journaled data (regular files only), inherit journaled data (directories only), synchronous-write, append-only, immutable, exhash (dirs only, read only)
File system permissions Unix permissions, ACLs and arbitrary security attributes
Transparent compression No
Transparent encryption No
Single Instance Storage across nodes only
Supported operating systems Linux

In computing, the Global File System (GFS) is a shared disk file system for Linux computer clusters.

GFS and GFS2 differ from distributed file systems (such as AFS, Coda, or InterMezzo) because it allows all nodes to have direct concurrent access to the same shared block storage. In addition, GFS and GFS2 can also be used as a local filesystem.

GFS has no disconnected operating-mode, and no client or server roles. All nodes in a GFS cluster function as peers. Using GFS in a cluster requires hardware to allow access to the shared storage, and a lock manager to control access to the storage. The lock manager is a separate module and thus GFS and GFS2 can use the Distributed Lock Manager (DLM) for cluster configurations and the "nolock" lock manager for local filesystems. Older versions of GFS also support GULM, a server based lock manager which implements redundancy via failover.

GFS and GFS2 are free software, distributed under the terms of the GNU General Public License.[1][2]

Contents

[edit] History

GFS was originally developed as part of a thesis-project at the University of Minnesota in 1997. It was originally written for SGI's IRIX operating system, but in 1998 it was ported to Linux since the open source code provided a more convenient development platform. In late 1999/early 2000 it made its way to Sistina Software, where it lived for a time as an open-source project. Sometime in 2001 Sistina made the choice to make GFS a commercial product — not under an open-source license.

Developers forked OpenGFS from the last public release of GFS and then further enhanced it to include updates allowing it to work with OpenDLM. But OpenGFS and OpenDLM became defunct, since Red Hat purchased Sistina in December 2003 and released GFS and many cluster infrastructure pieces under the GPL in late June 2004.

Red Hat subsequently financed further development geared towards bug-fixing and stabilization. A further development, GFS2[3]is derived from GFS and was included along with its distributed lock manager (shared with GFS) into Linux 2.6.19. GFS2 was included in Red Hat Enterprise Linux 5.2 only as a kernel module for evaluation purposes. With the 5.3 update GFS2 is now part of the kernel package.

As of 2007, GFS forms part of the Fedora and CentOS Linux distributions. Users can purchase commercial support to run GFS fully supported on top of Red Hat Enterprise Linux. Since Red Hat Enterprise Linux version 5, GFS support is already included with Red Hat Enterprise Linux Advanced Platform at no additional cost.

The following list summarizes some version numbers and major features introduced:

[edit] Hardware

GFS and GFS2 are designed to work in SAN-like environments. Although it is possible to use them as a single node filesystem, to get the full feature set, a SAN is required. This can take the form of iSCSI, FibreChannel, AoE, or any other device which can be presented under Linux as a block device shared by a number of nodes.

Also, the DLM requires an IP based network over which to communicate. This is normally just Ethernet, but again, there are many other possible solutions. Depending upon the choice of SAN, it maybe possible to combine this, although its more normal to have separate networks for the DLM and storage.

The final requirement is for fencing hardware of some kind. This is a requirement of the cluster infrastructure, rather than GFS/GFS2 itself, but it is required for all multi-node clusters. The usual options include power switches and remote access controllers (e.g. DRAC, IPMI, or ILO). Fencing is used to ensure that a node which the cluster believes to be failed cannot suddenly start working again while another node is recovering the journal for the failed node. It can also optionally restart the failed node automatically once the recovery is complete.

[edit] Differences from a local filesystem

Although the design goal of GFS/GFS2 is to be as similar as possible to a local filesystem, there are a number of differences to be aware of. Some of these are due to the existing filesystem interfaces not allowing the passing of information relating to the cluster and some of it is due to the difficulty of implementing those features efficiently in a clustered manner. Here are a few of those differences:

  • The flock() system call on GFS/GFS2 is not interruptible by signals.
  • The fcntl() F_GETLK system call returns a pid of any blocking lock. Since this is a cluster filesystem, that pid might refer to a process on any of the nodes which have the filesystem mounted. Since the purpose of this interface is to allow a signal to be sent to the blocking process, this is no longer possible.
  • Leases are not supported with the lock_dlm (cluster) lock module, but they are supported when used as a local filesystem
  • FIEMAP is supported on GFS2 only
  • dnotify will work on a "same node" basis, but its use with GFS/GFS2 is not recommended
  • inotify will also work on a "same node" basis, and is also not recommended (but maybe supported in the future)
  • splice is supported on GFS2 only

The other main difference, and one that is shared by all similar cluster filesystems, is that the cache control mechanism, known as glocks (pronounced Gee-locks) for GFS/GFS2, has an effect across the whole cluster. Each inode on the filesystem has two glocks associated with it. One (called the iopen glock) is only used for keeping track of which processes have the inode open. The other, the inode glock, controls the cache relating to that inode. A glock has four states, UN (unlocked), SH (shared - a read lock), DF (deferred - a read lock incompatible with SH) and EX (exclusive). Each of the four modes maps directly to a DLM lock mode.

When in EX mode, an inode is allowed to cache data and metadata (which might be "dirty", i.e. waiting for write back to the filesystem). In SH mode, the inode is allowed to cache data and metadata, but it must not be dirty. In DF mode, the inode is allowed to cache metadata only, and again it must not be dirty. The DF mode is used only for direct I/O. In UN mode, the inode must not cache any metadata.

In order that operations which change an inode's data or metadata do not interfere with each other, an EX lock is used. This means that certain operations, such as create/unlink of files from the same directory and writes to the same file should be, in general, restricted to one node in the cluster. Of course, doing these operations from multiple nodes will work as expected, but due to the requirement to flush caches frequently, it will not be very efficient.

The single most frequently asked question about GFS/GFS2 performance is why the performance can be poor with email servers. It should be reasonably obvious from the above, that the solution is to break up the mail spool into separate directories and to try and keep (so far as is possible) each node reading and writing to a private set of directories.

[edit] Journaling

GFS and GFS2 are both journaled filesystems and GFS2 supports a similar set of journaling modes as ext3. In data=writeback mode, only metadata is journaled. This is the only mode supported by GFS, however it is possible to turn on journaling on individual data files, but only when they are of zero size. Journaled files in GFS have a number of restrictions places upon them, such as no support for the mmap or sendfile system calls, they also use a different on disk format to regular files. There is also an "inherit-journal" attribute which when set on a directory causes all files (and sub-directories) created within that directory to have the journal (or inherit-journal, respectively) flag set. This can be used instead of the data=journal mount option which ext3 supports (and GFS/GFS2 doesn't).

GFS2 also supports data=ordered mode which is similar to data=writeback except that dirty data is synced before each journal flush is completed. This ensures that blocks which have been added to an inode will have their content synced back to disk before the metadata is updated to record the new size and thus prevents uninitialised blocks appearing in a file under node failure conditions. The default journaling mode is data=ordered, to match ext3's default.

GFS2 does not support data=journal mode yet, but it does (unlike GFS) use the same on disk format for both regular and journaled files, and it also supports the same journaled and inherit-journal attributes. GFS2 also relaxes the restrictions on when a file may have its journaled attribute changed to any time that the file is not open (also the same as ext3).

For performance reasons, each node in GFS and GFS2 has its own journal. In GFS the journals are disk extents, in GFS2 the journals are just regular files. The number of nodes which may mount the filesystem at any one time is limited by the number of available journals.

[edit] Compatibility and the GFS2 meta filesystem

GFS2 was designed so that upgrading from GFS would be a simple procedure. To this end, most of the on disk structure has remained the same as GFS, including the big-endian byte ordering. There are a few differences though:

  • GFS2 has a "meta filesystem" through which system files are accessed
  • GFS2 uses the same on disk format for journaled files as for regular files
  • GFS2 uses regular (system) files for journals, where as GFS uses special extents
  • GFS2 has some other "per_node" system files
  • The layout of the inode is (very slightly) different
  • The layout of indirect blocks is slightly different

The journaling systems of GFS and GFS2 are not compatible with each other. Upgrading is possible by means of a tool (gfs2_convert) which is run with the filesystem off-line to update the metadata. Some spare blocks in the GFS journals are used to create the (very small) per_node files required by GFS2 during the update process. Most of the data remains in place.

The GFS2 "meta filesystem" is not a filesystem in its own right, but an alternate root of the main filesystem. Although it behaves like a "normal" filesystem, its contents are the various system files used by GFS2 and normally users do not need to ever look at it. The GFS2 utilities mount and unmount the meta filesystem as required, behind the scenes.

[edit] See also

[edit] References

  1. ^ Teigland, David (29 June 2004) (PDF), Symmetric Cluster Architecture and Component Technical Specifications, Red Hat Inc, http://people.redhat.com/~teigland/sca.pdf, retrieved on 2007-08-03 .
  2. ^ Soltis, Steven R; Erickson, Grant M; Preslan, Kenneth W (1997), ""The Global File System: A File System for Shared Disk Storage"" (PDF), IEEE Transactions on Parallel and Distributed Systems, http://www.diku.dk/undervisning/2003e/314/papers/soltis97global.pdf .
  3. ^ Whitehouse, Steven (27-30 June 2007). "The GFS2 Filesystem" (PDF). Proceedings of the Linux Symposium 2007: 253-259. 

[edit] External links

Personal tools