ext4
From Wikipedia, the free encyclopedia
This article may be divided into too many sections considering its overall length. To help improve Wikipedia's quality standards, some of the sections may need to be condensed or merged. Please discuss this issue on the talk page. |
Developer | Mingming Cao, Andreas Dilger, Alex Tomas, Dave Kleikamp, Theodore Ts'o, Eric Sandeen, Sam Naghshineh, others |
---|---|
Full name | Fourth extended file system |
Introduced | Stable: October 21, 2008 Unstable: October 10, 2006 (Linux 2.6.19) |
Partition identifier | 0x83 (MBR) EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (GPT) |
Structures | |
Directory contents | Linked list, htree |
File allocation | Extents/Bitmap |
Bad blocks | Table |
Limits | |
Max file size | 16 TB (for 4k block filesystem) |
Max number of files | 4 billion (specified at filesystem creation time) |
Max filename length | 256 bytes |
Max volume size | 1 EB |
Allowed characters in filenames | All bytes except NULL ('\0') and '/' |
Features | |
Dates recorded | modification (mtime), attribute modification (ctime), access (atime), delete (dtime), create (crtime) |
Date range | December 14, 1901 - April, 25 2514 |
Date resolution | Nanosecond |
Forks | No |
Attributes | extents, noextents, mballoc, nomballoc, delalloc, nodelalloc, data=journal, data=ordered, data=writeback, commit=nrsec, orlov, oldalloc, user_xattr, nouser_xattr, acl, noacl, bsddf, minixdf, bh, nobh, journal_dev |
File system permissions | POSIX |
Transparent compression | No |
Transparent encryption | No |
Single Instance Storage | No |
Supported operating systems | Linux |
The ext4 or fourth extended filesystem is a journaling file system developed as the successor to ext3. It was born as a series of backward compatible extensions to add to ext3 64-bit storage limits and other performance improvements.[1] However, other Linux kernel developers opposed accepting extensions to ext3 for stability reasons,[2] and proposed to fork the source code of ext3, rename it as ext4, and do all the development there, without affecting the current ext3 users. This proposal was accepted, and on June 28, 2006 Theodore Ts'o, the ext3 maintainer, announced the new plan of development for ext4.[3] A preliminary development snapshot of ext4 was included in version 2.6.19 of the Linux kernel. On Oct 11, 2008, the patches that mark ext4 as stable code were merged in the Linux 2.6.28 source code repositories,[4] denoting the end of the development phase and recommending ext4 adoption. Kernel 2.6.28, containing the ext4 filesystem, was finally released on December 25, 2008.[5]
[edit] Features
[edit] Large file system
The ext4 filesystem can support volumes with sizes up to 1 exabyte[6] and files with sizes up to 16 terabytes.
[edit] Extents
Extents are introduced to replace the traditional block mapping scheme used by ext2/3 filesystems. An extent is a range of contiguous physical blocks, improving large file performance and reducing fragmentation. A single extent in ext4 can map up to 128MB of contiguous space with a 4KB block size.[1] There can be 4 extents stored in the Inode. When there are more than 4 extents to a file the rest of the extents are indexed in a Htree.
[edit] Backward compatibility
The ext4 filesystem is backward compatible with ext3 and ext2, making it possible to mount ext3 and ext2 filesystems as ext4. This will already slightly improve performance, because certain new features of ext4 can also be used with ext3 and ext2, such as the new block allocation algorithm.
[edit] Forward compatibility
The ext4 file system is partially forward compatible with ext3, that is, it can be mounted as an ext3 partition (using “ext3” as the filesystem type when mounting). However, if the ext4 partition uses extents (a major new feature of ext4), then the ability to mount the file system as ext3 is lost.
[edit] Persistent pre-allocation
The ext4 filesystem allows for pre-allocation of on-disk space for a file. The current methodology for this on most file systems is to write the file full of 0s to reserve the space when the file is created (although XFS has an ioctl to allow for true pre-allocation as well). This method would no longer be required for ext4; instead, a new fallocate() system call was added to the linux kernel for use by filesystems, including ext4 and XFS, that have this capability. The space allocated for files such as these would be guaranteed and would likely be contiguous. This has applications for media streaming and databases.
[edit] Delayed allocation
Ext4 uses a filesystem performance technique called allocate-on-flush, also known as delayed allocation. It consists of delaying block allocation until the data is going to be written to the disk, unlike other file systems, which allocate the necessary blocks before that step. This improves performance and reduces fragmentation by improving block allocation decisions based on the actual file size.
[edit] Break 32,000 subdirectory limit
In ext3 the number of subdirectories that a directory can contain is limited to 32,000. This limit has been raised to 64,000 in ext4, and with the "dir_nlink" feature it can go beyond this (although it will stop increasing the link count on the parent). To allow for continued performance given the possibility of much larger directories, htree indexes (a specialized version of a B-tree) is turned on by default in ext4. This feature is implemented in Linux kernel 2.6.23. Htree is also available in ext3 when the dir_index feature is enabled.
[edit] Journal checksumming
Ext4 uses checksums in the journal to improve reliability, since the journal is one of the most used files of the disk. This feature has a side benefit; it can safely avoid a disk I/O wait during the journaling process, improving performance slightly. The technique of journal checksumming was inspired by a research paper from the University of Wisconsin titled IRON File Systems (specifically, section 6, called "transaction checksums").[7]
[edit] Online defragmentation
There are a number of proposals for an online defragmenter, but that support is not yet included in the mainline kernel. Even with the various techniques used to avoid fragmentation, a long lived file system does tend to become fragmented over time. Ext4 will have a tool which can defragment individual files or entire file systems.
[edit] Faster file system checking
In ext4, unallocated block groups and sections of the inode table are marked as such. This enables e2fsck to skip them entirely on a check and greatly reduce the time it takes to check a file system of the size ext4 is built to support. This feature is implemented in version 2.6.24 of the Linux kernel.
[edit] Multiblock allocator
Ext4 allocates multiple blocks for a file in single operation, which reduces fragmentation by attempting to choose contiguous blocks on the disk. The multiblock allocator is active when using O_DIRECT or if delayed allocation is on. This allows the file to have many dirty blocks submitted for writes at the same time, unlike the existing kernel mechanism of submitting each block to the filesystem separately for allocation.
[edit] Improved timestamps
As computers become faster in general and specifically Linux becomes used more for mission critical applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides timestamps measured in nanosecond. In addition, 2 bits of the expanded timestamp field are added to the most significant bits of the seconds field of the timestamps to defer the year 2038 problem for an additional 500 years.
Support for date-created timestamps was added in ext4. However, as Theodore Ts'o points out, while adding an extra creation date field in the inode is easy (thus technically enabling support for date-created timestamps in ext4), modifying or adding the necessary system calls, like stat() (which would probably require a new version), and the various libraries that depend on them (like glibc) is not trivial and would require the coordination of many different projects. So even if ext4 developers implement initial support for creation-date timestamps, this feature will not be available to user programs for now.[8]
[edit] Caveats
[edit] Delayed allocation may expose API bugs
The delayed allocation poses some additional risk of data loss in cases where the system crashes before all data has been written to the disk.
The typical scenario is a program that replaces the content of a file, without forcing a write to the disk with fsync. If the system crashes shortly afterwards, it is really undefined what is supposed to happen. However, users have come to expect that the disk holds either the old version or the new version of the file, a behaviour that ext3 would usually yield. Whereas the ext4 code in kernel 2.6.28 will often have truncated the file to zero length before the crash, but not yet written the new version, so that the contents of the file is lost.
Though this is really an application bug, many people still find such system behavior unacceptable. In response, Theodore Ts'o has written some patches for ext4 that cause it to limit its delayed allocation in these common cases. For a small cost in performance, this will significantly increase the chance that a version of the file survives the crash.
The new patches are expected to become part of the mainline kernel 2.6.30. Various distributions may choose to backport them to 2.6.28 or 2.6.29, for instance Ubuntu intends to make them part of the 2.6.28 kernel in version 9.04 -- Jaunty Jackalope.[9]
[edit] See also
[edit] References
- ^ a b Mathur, Avantika; Cao, MingMing; Bhattacharya, Suparna; Dilger, Andreas; Tomas, Alex; Vivier, Laurent (2007). "The new ext4 filesystem: current status and future plans". written at Ottawa, ON, CA (PDF). Proceedings of the Linux Symposium. Red Hat. https://ols2006.108.redhat.com/2007/Reprints/mathur-Reprint.pdf. Retrieved on 2008-01-15.
- ^ Torvalds, Linus (2006-06-09). "extents and 48bit ext3". LKML. http://lkml.org/lkml/2006/6/9/183.
- ^ Ts'o, Theodore (2006-06-28). "Proposal and plan for ext2/3 future development work". LKML. http://lkml.org/lkml/2006/6/28/454.
- ^ "ext4: Rename ext4dev to ext4". Linus' kernel tree. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=03010a3350301baac2154fa66de925ae2981b7e3. Retrieved on 2008-10-20.
- ^ Leemhuis, Thorsten (2008-12-23). "Higher and further: The innovations of Linux 2.6.28". Heise Online. http://www.heise-online.co.uk/open/Kernel-Log-Higher-and-Further-The-innovations-of-Linux-2-6-28--/features/112299.
- ^ "Migrating to Ext4". DeveloperWorks. IBM. http://www.ibm.com/developerworks/linux/library/l-ext4/. Retrieved on 2008-12-14.
- ^ Vijayan Prabhakaran, et al (PDF). IRON File Systems. CS Dept, University of Wisconsin. http://www.cs.wisc.edu/wind/Publications/iron-sosp05.pdf.
- ^ "Theodore Ts'o answer on creation time stamps for ext4". http://osdir.com/ml/file-systems.ext3.user/2006-10/msg00015.html.
- ^ Ubuntu bug #317781 Long discussion between Ubuntu wizards and Theodore Ts'o on potential data loss
[edit] External links
- Kernel Log: Ext4 completes development phase as interim step to btrfs
- Theodore Ts'o's discussion on ext4
- Real World Benchmarks Of The EXT4 File-System
- Ext4 Development Wiki
- "Ext4 block and inode allocator improvements" (materials from Ottawa Linux Symposium 2008)
- "The new ext4 filesystem: current status and future plans" (materials from Ottawa Linux Symposium 2007)
- "ext4 online defragmentation" (materials from Ottawa Linux Symposium 2007)
- “Ext4: The Next Generation of Ext2/3 Filesystem”
- Kernelnewbies.org: Ext4, the Fourth Extended File System
- Gentoo Live InstallCD, with ext4 and ssh