Filename
From Wikipedia, the free encyclopedia
It has been suggested that Fully qualified file name be merged into this article or section. (Discuss) |
A filename is a special kind of string used to uniquely identify a file stored on the file system of a computer. Some operating systems also identify directories in the same way. Different operating systems impose different restrictions on length and allowed characters on filenames. A filename includes one or more of these components:
- protocol (or scheme) — access method (e.g., http, ftp, file etc.)
- host (or network-ID) — host name, IP address, domain name, or LAN network name (e.g., wikipedia.org, 207.142.131.206, \\MYCOMPUTER, SYS:, etc.)
- device (or node) — port, socket, drive, root mountpoint, disc, volume (e.g., C:, /, SYSLIB, etc.)
- directory (or path) — directory tree (e.g., /usr/bin, \TEMP, [USR.LIB.SRC], etc.)
- file — base name of the file
- type (format or extension) — indicates the content type of the file (e.g., .txt, .exe, .dir, etc.)
- version — revision number of the file
To refer to a file on a remote computer (aka: host, server), the remote computer must be known. The remote computer name or address might be part of the file name, or it might be specified at the time a file system is "mounted", in which case it won't necessarily be part of the file name.
In some systems, if a filename does not contain a path part, the file is assumed to be in the current working directory.
Many operating systems, including MS-DOS, Microsoft Windows, and VMS systems, allow a filename extension that consists of one or more characters following the last period in the filename, thus dividing the filename into two parts: the basename (the primary filename) and the extension (usually indicating the file type associated with a certain file format). On these systems the extension is considered part of the filename, and on systems which allow (for example) an eight character basename followed by a three character extension, a filename with an extension of "" or " " (nothing, or three spaces) will still be 11 characters long (since the "." is supplied, but not considered as part of the name, by the OS). On Unix-like systems, files will often have an extension (for example prog.c, denoting the C-language source code of a program called "prog"); but since the extension is not considered a separate part of the filename, a file on a Unix system which allows 14-character filenames, and with a filename which uses "." as an "extension separator" or "delimiter", could possibly have a filename such as a.longxtension
Within a single directory, filenames must be unique. Since filename also applies for subdirectories, it is also not possible to create equally named file and subdirectory entries in a single directory. However, two files in different directories may have the same name. In some operating systems, such as MS-DOS, Microsoft Windows, and classic Mac OS, upper-case letters and lower-case letters in file names are considered the same, so that, for example, the file names "MyName" and "myname" would be considered the same, and a directory could not contain a file with the name "MyName" and another file with the name "myname". The file systems in those operating systems are called "case-insensitive". In most file systems in Unix-like systems, however, upper-case and lower-case are considered different, so that files MyName and myname would be valid names for different files in the directory. Those file systems are called "case-sensitive". Not all file systems in Unix-like systems are case-sensitive; by default, HFS+ in Mac OS X is case-insensitive, and SMB servers usually provide case-insensitive behavior (even when the underlying file system is case-sensitive, for example Samba on most Unix-like systems), so SMB client file systems provide case-insensitive behavior. File system case sensitivity is a considerable challenge for software such as Samba and Wine, which must interoperate efficiently with both systems that treat uppercase and lowercase files as different and systems that treat them the same.[1]
Unix-like systems allow a file to have more than one name; in traditional Unix-style file systems, the names are hard links to the file's inode or equivalent. Hard links are different from Windows shortcuts, Mac OS aliases, or symbolic links.
Contents |
[edit] Reserved characters and words
Many operating systems prohibit control characters from appearing in file names. For example, DOS and early Windows systems require files to follow the 8.3 filename convention. Unix-like systems are an exception, as the only control character forbidden in file names is the null character, as that's the end-of-string indicator in C. Trivially, Unix also excludes the path separator / from appearing in filenames.
Some operating systems prohibit some particular characters from appearing in file names:
Character | Name | Reason |
---|---|---|
/ | slash | used as a path name component separator in Unix-like, Windows, and Amiga systems. (The MS-DOS command.com shell would consume it as a switch character, but Windows itself always accepts it as a separator[2]) |
\ | backslash | Also used as a path name component separator in MS-DOS, OS/2 and Windows (there is no difference between slash and backslash); allowed in Unix filenames, see Note 1 |
? | question mark | used as a wildcard in Unix, Windows and AmigaOS; marks a single character. Allowed in Unix filenames, see Note 1 |
% | percent sign | used as a wildcard in RT-11; marks a single character. |
* | asterisk | used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1 |
: | colon | used to determine the mount point / drive on Windows; used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; used as a pathname separator in classic Mac OS. Doubled after a name on VMS, indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\\".) |
| | vertical bar | designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1 |
" | quotation mark | used to mark beginning and end of filenames containing spaces in Windows, see Note 1 |
< | less than | used to redirect input, allowed in Unix filenames, see Note 1 |
> | greater than | used to redirect output, allowed in Unix filenames, see Note 1 |
. | period | allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. In other OSes, usually considered as part of the filename, and more than one full stop may be allowed. |
Note 1: Most Unix shells require certain characters such as spaces, <, >, |, \, and sometimes :, (, ), &, ;, as well as wildcards such as ? and *, to be quoted or escaped:
five\ and\ six\<seven (example of escaping)
'five and six<seven' or "five and six<seven" (examples of quoting)
In Windows the space and the period are not allowed as the final character of a filename. The period is allowed as the first character, but certain Windows applications, such as Windows Explorer, forbid creating or renaming such files (despite this convention being used in Unix-like systems to describe hidden files and directories). Among workarounds are using different explorer applications or saving a file with the desired file name from within an application .[3]
Some file systems on a given operating system (especially file systems originally implemented on other operating systems), and particular applications on that operating system, may apply further restrictions and interpretations. See comparison of file systems for more details on restrictions imposed by particular file systems.
In Unix-like systems, MS-DOS, and Windows, the file names "." and ".." have special meanings (current and parent directory respectively).
In addition, in Windows and DOS, some words might also be reserved and can not be used as filenames.[3] For example, DOS Device file:
CON, PRN, AUX, CLOCK$, NUL COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9 LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
Operating systems that have these restrictions cause incompatibilities with some other filesystems. For example, Windows will fail to handle, or raise error reports for, these legal UNIX filenames: aux.c, q"uote"s.txt, or NUL.txt.
[edit] Comparison of file name limitations
System | Alphabetic Case Sensitivity | Allowed Character Set | Reserved Characters | Reserved Words | Maximum Length | Comments |
---|---|---|---|---|---|---|
MS-DOS FAT | case-insensitive case-destruction | any | 0x00-0x1F SPACE DEL " * / : < > ? \ | | Devicesnames like: AUX COM1 COM2 COM3 COM4 COM5 COM6 COM7 COM8 COM9 CON LPT1 LPT2 LPT3 LPT4 LPT5 LPT6 LPT7 LPT8 LPT9 NUL PRN | 12 | Maximum 8 character name limit and 3 character extension; see 8.3 filename |
Commodore 64 | case-sensitive case-preservation | any | :,= | $ | 16 | Actual limit depends on the drive used, but most drives limit the length to 16 characters. |
Win95 VFAT | case-insensitive | any | |\?*<":>+[]/ control characters | 255 | ||
NTFS | optional (case-preservation) | any (including Unicode characters) | / null (i.e., 0x00) | Only in Root Directory: $AttrDef $BadClus $Bitmap $Boot $LogFile $MFT $MFTMirr pagefile.sys $Secure $UpCase $Volume $Extend $Extend\$ObjId $Extend\$Quota $Extend\$Reparse ($Extend is a directory) | 255 | Microsoft Windows: Windows kernel forbids the use of characters in range 1-31 (i.e., 0x01-0x1F) and characters " * : < > ? \ / |. Although NTFS allows each path component (directory or filename) to be 255 characters long and paths up to about 32767 characters long, the Windows kernel only supports paths up to 259 characters long. Additionally, Windows forbids the use of the MS-DOS device names AUX, CLOCK$, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, CON, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, NUL and PRN, as well as these names with any extension (for example, AUX.txt), except when using Long UNC paths (ex. \\.\C:\nul.txt or \\?\D:\aux\con). (In fact, CLOCK$ may be used if an extension is provided.) These restrictions only apply to Windows - Linux, for example, allows use of " * : < > ? \ / | even in NTFS. |
OS/2 HPFS | case-insensitive case-preservation | any | |\?*<":>/ | 254 | ||
Mac OS HFS | case-insensitive case-preservation | any | : | 255 | old versions of Finder are limited to 31 characters | |
Mac OS HFS+ | optional (case-preservation) | any | : on disk, in classic Mac OS, and at the Carbon layer in Mac OS X; / at the Unix layer in Mac OS X | 255 | Mac OS 8.1 - Mac OS X | |
most UNIX file systems | case-sensitive case-preservation | any | / null | 255 | a leading . indicates that ls and file managers will not by default show the file | |
early UNIX (AT&T) | case-sensitive case-preservation | any | / | 14 | a leading . indicates a "hidden" file | |
POSIX "Fully portable filenames"[4] | case-sensitive case-preservation | A–Za–z0–9._- | / null | Filenames to avoid include: a.out, core, .profile, .history, .cshrc | 14 | hyphen must not be first character |
AmigaOS | case-insensitive case-preservation | any | :/" | 107 | dos.library | |
Amiga OFS | case-insensitive case-preservation | any | :/" | 30 | Original File System 1985 | |
Amiga FFS | case-insensitive case-preservation | any | :/" | 30 | Fast File System 1988 | |
Amiga PFS | case-insensitive case-preservation | any | :/" | 255 | Professional File System 1993 | |
Amiga SFS | case-insensitive case-preservation | any | :/" | 32,000 | Smart File System 1998 | |
Amiga FFS2 | case-insensitive case-preservation | any | :/" | 107 | Fast File System 2 2002 | |
BeOS BFS | case-sensitive | UTF-8 | / | 255 | ||
DEC PDP-11 RT-11 | case-insensitive | RADIX-50 | 6 + 3 | Flat filesystem with no subdirs. A full "file specification" includes device, filename and extension (file type) in the format: dev:filnam.ext. | ||
DEC VAX VMS | case-insensitive | A–Z 0–9 _ | 32 per component; earlier 9 per component; latterly, 255 for a filename and 32 for an extension. | a full "file specification" includes nodename, diskname, directory/ies, filename, extension and version in the format: OURNODE::MYDISK:[THISDIR.THATDIR]FILENAME.EXTENSION;2 Directories can only go 8 levels deep. | ||
ISO 9660 | case-insensitive | A–Z 0–9 _ . | 255 | 8 directory levels max (for Level 1 conformance) |
[edit] See also
- File system
- Long filename
- Path (computing)
- Uniform Resource Identifier (URI)
- Uniform Resource Locator (URL)
[edit] References
- ^ http://wiki.winehq.org/CaseInsensitiveFilenames
- ^ http://www.thescripts.com/forum/thread23123.html
- ^ a b Naming a file msdn.microsoft.com (MSDN), filename restrictions on Windows
- ^ Lewine, Donald. POSIX Programmer's Guide: Writing Portable UNIX Programs 1991 O'Reilly & Associates, Inc. Sebastopol, CA pp63-64
[edit] External links
- List of filename extensions: FILExt
- Large collection of extensions: DotWhat.net
- Large list of filename extensions: File-extensions.org
- Metasearch engine for file extensions: File Extension Seeker
- The File Extension Resource