What You Will Learn

What You Will Learn

After completing this lesson you will be able to:

· Identify three primary advantages of NTFS over other file systems (FAT or HPFS).

· Explain why the NTFS file system is said to be recoverable.

· Understand the role of NTFS' Master File Table (MFT).

NTFS Volume Structure

When a drive is formatted with the NTFS file system, the partition is initialized to contain an NTFS volume. More accurately, each instance of a Master File Table (MFT) is a volume. Unlike a FAT or HPFS partition, all space which is allocated and in use on an NTFS volume is part of a file, including the bootstrap and system files which are used to implement the volume structure. The heart of the NTFS volume structure is the Master File Table (MFT) which contains at least one record for each file on the volume, including one for itself, with each record being 2K in size. This makes NTFS appear very much like a relational database.

On an NTFS volume all files are identified by a file number, which is created from the position of the file in the MFT and a sequence number (discussed in a later section of this module). Each file, and directory, on an NTFS volume is made up of a set of attributes.
Cluster Factor

For the NTFS file system, the cluster is the fundamental unit of allocation. The cluster factor is expressed as a number of bytes, and formatting a volume as NTFS guarantees that the cluster factor is a multiple of the sector size on the target device. Internally, NTFS addresses everything by a cluster number, and is totally unaware of the sector size or any other information about the underlying drive. References to physical offsets on the device are stored internally by their Logical cluster number (Lcn). References to offsets within large attributes are stored internally by their Virtual cluster number (Vcn). When accessing the physical disk via the disk driver, the desired Lcn is multiplied by the cluster factor to form the physical byte offset on the volume.

Logical cluster number (Lcn) x Cluster Factor = Physical offset

The new FORMAT.COM /A: switch allows an NTFS volume's allocation unit size (cluster size) to be specified. The syntax of the switch is: format d: /a:xxxx where xxxx may be 512, 1024, 2048, or 4096. Using a larger allocation unit size will speed up searches and reduce external fragmentation, however at the cost of greater internal fragmentation. The default for BETA 2 will be the same as BETA 1, 512 bytes per allocation unit. In the final release, the default allocation unit size will be based on the disk size:

Allocation size Disk size

512 bytes < 512 MB

1024 bytes 512 MB to GB

2048 bytes 1 GB to 2 GB

4096 bytes > 2 GB

The NTFS Boot Sector

The NTFS Boot Sector is located at the beginning of the volume, with a duplicate located in the middle of the volume. The NTFS Boot Sector contains the standard BIOS Parameter Block (BPB), the number of sectors in the volume, and the starting Logical cluster numbers (Lcns) of the Master File Table (MFT) and the Master File Table Mirror (MFT2).

NTFS Files

Under NTFS, files are made up of at least the following attributes:

· Header (H)

· Standard Information (SI)

· Security Descriptor (SD)

· File Name (FN)

· Data

Small Files

If a file is small, i.e. the file's data attribute is small, it may fit entirely into its MFT file record, which is 2K in size, and therefore be referred to as a small file. However, it is not possible to state with certainty that under NTFS, all files smaller than xxxx bytes will fit into it's MFT file record. Whether or not a small file will fit into the MFT record for the file, really depends upon the size of the other attributes in the file record. However, typically files under about 1,500 bytes will be stored within it's MFT entry and therefore take up no space beyond the 2K MFT entry.

Large Files

If the file is larger than can be fit into the file record, a non-resident form of the Data attribute is used. Here, the Data attribute contains the Virtual cluster number (Vcn) for the first cluster in each of the runs, and the number of contiguous clusters in each of the runs.

Huge Files

If the file is so huge that even the Data attribute can not be resident in the file record, it will become a non-resident attribute. The non-resident Data attribute then points to the Data runs.

In the event of an extremely huge file, the external attribute can point to multiple non-resident Data attribute records can be used to point to the Data runs. In addition, the External Attribute, like any other attribute can be stored in the non-resident form and thus, there should never be an attribute so large that it cannot be handled by NTFS.

NTFS Directories

Simply stated, an NTFS directory is an entry in the MFT as well as a special form of index. A general index allows for sorting of files for rapid retrieval based upon a specific attribute. Traditionally, with FAT and HPFS, the filename attribute has been used for sorting and retrieval of files. The filename is also used under NTFS for sorting and retrieval. However with NTFS, it is possible for a third party utility to use any attribute for sorting and retrieval purposes, as long as the attribute used is stored in the resident form in the file.
Small Indexes

If the number of files in the directory is small enough, the index can be resident in the MFT record for the directory - a small index.

The NTFS small index entry contains the value of the attribute being used as the index (by default, the filename), as well as the file number (the record number in the MFT of the initial file record) of the file. There is no other information regarding the file stored in the index entry.

As a directory grows larger, the index structure can be made non-resident. However, the root of the index must always remain resident in the root MFT record of the directory.

A large index is handled by creating an Index Allocation (IA) attribute, which contains a number of Btree buffers with the root of the Btree remaining in the Index. A large index contains a set of <attribute value, file number, Vcn>, where the attribute value by default is the filename and file number is the file's file number. The Vcn is a Virtual cluster number that points to the Btree buffer, which is located in the IA, that contains attribute values (by default filenames) less than the attribute value stored in the root entry. The bottom of the directory tree has been reached when there are no more Vcns.

Based on this, in the below diagram the Vcn in the first Index entry (<D.BAT,FN(d.bat),Vcn> is pointing to the Btree buffer in the IA that refers to the <A.BAT,FN(a.bat)> <B.BAT,FN(b.bat)> <####> run.

It is also possible for the lower level runs to point to other even lower level runs.

NTFS Attributes

Each NTFS file consists of one or more attributes, with each attribute consisting of an attribute type, attribute length, attribute value, and optionally an attribute name.

With NTFS there are a set of system defined attributes which are defined by the NTFS volume structure. System defined attributes have fixed names and attribute type codes, and the format of their values are determined by and enforced by NTFS. There are also, user definable attributes under NTFS. The name and format of user defined attributes are determined solely by users, with the attribute type codes established uniquely for each NTFS volume.

Attributes are ordered by ascending attribute type, with some attribute types being allowed multiple times.

When accessing an NTFS file, it is not really a file that is being written to or read from, but rather a set of attributes.
Attribute Storage

There are two ways in which attributes can be stored. These methods are commonly referred to as resident and non-resident attribute storage.

Attributes are, by default, stored in the MFT record for a file if there is room, this is resident attribute storage. If there is not enough room in a file's MFT record, space is allocated on the disk to hold the content of the attribute and an MFT entry points to this information. This method of storage is called non-resident storage and is used for storing External Attributes (EAs, defined below).

Non-resident attributes are referenced by the first Virtual cluster number (Vcn) in each run and the number of clusters in the run.

Only resident attributes may be indexed, i.e. sorted for directory purposes.
System Defined Attributes

Attribute List

This attribute defines the valid attributes for this particular file.
File Name

This attribute contains the long filename for the file as well as the file number of the parent directory file. If there are multiple directory entries for a file, then there will be a corresponding number of filename attributes. This attribute, must always be a resident attribute.
MS-DOS Name

This attribute contains the 8.3 filename as well as the case-insensitive filename.
Version

This attribute specifies the update version of the file.
Security Descriptor

This attribute contains the security information for the file (the file's Access Control List (ACL)), who can access the file, etc. In addition, this attribute also contains an audit field which contains the information used to determine which activities on this file will be audited or not.
Volume Version

This attribute is used only in volume system files.
Volume Name

This attribute contains the volume label.
Volume Information

This attribute is used only in volume system files and contains the major and minor version number of NTFS on the volume.
Data

This attribute contains the normal file data as well as the normal file sizes and sizes for all attributes. This attribute may be stored in either resident or non-resident form.

On NTFS, it is possible for any file or directory to contain optional "named" attributes for storing additional information. This is not currently exposed through the User Interface, but can be done through the command line and is thus an easy way to hide data.

echo Windows NT > trythis

echo NTFS > trythis:filesystems

more < trythis

Screen output: Windows NT

more < trythis:filesystems

Screen output: NTFS

md testdir

echo testing > testdir:data

more < testdir:data

Screen output: testing

Note This is one of the few ways that this capability is exposed.

The information that is typically considered to be file "data" is stored in the default (un-named) data attribute. For this reason, copying a file with multiple data attributes to a FAT or HPFS drive, will lose the data attributes that are not the default data attribute.
Index Root

This attribute is used to construct the index of a particular attribute over a set of files for the purpose of indexing these files by their attribute value. Index Root is always stored in resident form.
Index Allocation

If the index found in the Index Root attribute becomes too large, a non-resident attribute, Index Allocation, is formed to store the rest of the index information.
MFT Bitmap

This attribute provides a map that represents the sectors that are in use on the volume.
External Attributes (EAs)

When all of a file's attributes do not fit in the Master File Table record for that file, the additional attribute(s) will be moved to a new record, i.e. stored as non-resident attributes.
External Attribute (EA) Information

This attribute contains information about any external attributes.
User Defined Attributes

This attributes allows "user defined attributes" to be added to files or directories. These "user defined attributes" are actually added by applications, there is no User Interface in Windows NT to add "user defined attributes".
Standard Information (SI) Attribute

This attribute stores all standard file information that is not easily associated with any other attribute, such as file creation time, the last time the Data attribute was modified, the last time any attribute was modified, the last time the file was accessed, the file's attributes (read only, hidden, etc.), maximum file version, and version number. Standard Information is always stored as a resident attribute.
Headers

Headers are also a System Defined attribute, however because of their importance they are under a separate heading.

NTFS Headers consist of the following components:
Update Sequence array

This is used in the detection of incomplete multi-sector transfers.
Sequence Number

This number is incremented each time a file record is used.
Reference Count

This number is incremented for each reference to this file record from an Index attribute. In other words, this is the number of "directories" which contain this particular file.
Root File Record Segment

If this is NOT the first MFT record for this file, then this value essentially "points" to the first/root MFT record of this file. If this is the first MFT record for this file, then the Root File Record Segment has value of 0 (zero).
First Attribute Offset

This is the offset, from the beginning of the Master File Table (MFT), to the first attribute for this file record.

NTFS System Files

The NTFS system files are created on the volume during the format of the volume and are created by the FORMAT command. The NTFS system files all have known file numbers in the MFT, including the MFT itself, and reside in the root directory of the NTFS volume. Although they reside in the root directory, there currently is no method to view the volume's system files, other than using CHKDSK to view the space in use by the system files. In the below screen shot, CHKDSK is showing 7946 kilobytes in use by the system on a 150 MB NTFS volume.

Note All of the NTFS system files can be located anywhere on the NTFS volume. This avoids the problem that both FAT and HPFS had with file system components having to be located in a specific place on the hard drive. With FAT or HPFS, if one of these locations became damaged, the entire drive could become inaccessible because of the failure of a single sector.

Master File Table (MFT) - $mft

The Master File Table contains one record for every file on an NTFS volume in its Data attribute, including one for itself. The first 16 records are reserved for NTFS system files, with only the first nine currently in use.
Master File Table2 - $mftmirr (or MFT2)

This is a mirror of the first three records of the MFT. Therefore, the MFT2 contains a copy of the MFT, MFT2, and the Log File. This file is used for recoverability purposes. Depending on the size of the volume, there may be multiple mirrors of the MFT to ensure that there is always a good mirror of the MFT.
Log File - $logfile

This file's Data attribute is used by NTFS and the Log File Service to make the file system recoverable. The Log File is a system file so that it can be found early in the boot process and used to recover the volume, if necessary.
Volume - $volume

This file contains the information for the volume, such as the volume name, version, etc.
Attribute Definitions Table - $attrdef

The Attribute Definitions table contains the definitions of all of the system defined attributes and any user defined attributes on the volume. The definition of each attribute includes the attribute name, attribute type code, flags, display rules, indexing rules, minimum length, maximum length, and security information.
Cluster Allocation Bitmap - $bitmap

This file's Data attribute contains the storage bitmap for the entire volume, showing which allocation units are in use. Since the allocation granularity is one cluster, each bit in the bitmap represents a cluster. Using a Cluster Allocation Bitmap, increases performance on an NTFS volume when searching for free space on the volume, as compared with previous file systems.
Boot File - $boot

The Boot File contains the volume's bootstrap, if this is a bootable volume, in it's Data attribute. An advantage of making this a file is that it may be located anywhere on the volume, and can still be made read only or protected by an Access Control List (ACL).
Bad Cluster File - $badclus

The Data attribute of this file contains all of the bad cluster on the volume. This means that the Data attribute of $badclus is always non-resident, and maps all of the volume's bad clusters to virtual clusters within $badclus. If new bad clusters are found while the system is running, they are added to the Bad Cluster file, i.e. hot fixing under NTFS.

As was talked about in the File Management section of this course, HPFS had a similar structure called the Spare Block. However, the HPFS Spare Block had a limitation of one sector, where as the Bad Cluster File has no size limitation.
Upcase Table - $upcase

This file is a table used to convert filenames with upper and lower case characters to matching upper case Unicode characters. This is to support applications from environments that only use upper case character in their filenames, such as MS-DOS.
System File Sizes

The following numbers, in bytes, are approximations, but they do give a rough idea as to how much space will be used by the system files on an NTFS volume.

System File Starting Size Change in Size

$mft 38,912 Dynamic, an additional 2,048+ per file on the drive.

$mftmirr (MFT 2) 8,192 Constant size.

$logfile 4,194,304 to 10,485,760 May grow after format time.

$attrdef 36,000 Constant size, unless user defined attributes are added.

$bitmap 3,256 Dynamic, 1 bit per allocation unit.

$boot 1,024 Constant size.

$badclus 0 Depending on the number of bad clusters on the disk.

$upcase 131,072 Constant size.

Volumes that are around 30-40 MB or less will typically have a constant system overhead of approximately 4.3 MB at initial format time.

Converting to NTFS

It is possible to convert a FAT or HPFS partition to an NTFS volume using the CONVERT.EXE utility provided with Windows NT. However, the conversion is a one way process, there is no way to convert an NTFS volume to FAT or HPFS.

The convert utility uses the following command line: convert <drive:> /fs:ntfs, where <drive:> is the letter of the drive to be converted to NTFS.

If CONVERT.EXE can not get exclusive access to the drive to be converted, it will give an error message to that affect. CONVERT.EXE will then prompt with the option to schedule the drive for conversion when the system reboots. If the drive is scheduled for conversion on system boot: autocheck autoconv \DosDevices\x: /FS:NTFS is added to the

\HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\SessionManager:BootExecute

key in the Registry.

In addition to CONVERT.EXE, CUFAT.DLL (for FAT partitions) and CUHPFS.DLL (for HPFS partitions) are necessary to convert a drive to NTFS. By default, these two DLLs are installed in the \<winnt root>\System32 subdirectory.
Converting HPFS Security

When converting an HPFS partition to NTFS, HPFS security can be retained by converting the information to NTFS security.

The steps for performing this conversion are:

1. Under OS/2, before installing Windows NT, run the OS/2 utility BACKACC.EXE. This utility stores the HPFS security information in a file on the drive.

2. Next, Windows NT should be installed on the system. During the installation, the drive can be converted to NTFS, or after the installation is completed the CONVERT.EXE utility can be used to convert the drive to NTFS.

3. After Windows NT is installed and the drive has been converted to NTFS, ACLCONV.EXE should be used. ACLCONV.EXE takes the information stored under step 1 and applies the information to the files under NTFS.

Security on NTFS

Security is stored under NTFS in the same in which all Windows NT security is stored, in Access Control Lists (ACLs). As we talked about in past security discussions, each ACL is made up of a series of entries known as Access Control Entries (ACEs). The ACEs are checked in order to determine if a user has permission to perform a certain action.
How ACEs are Ordered

ACEs are ordered by type; deny, and then grant. Windows NT checks ACEs with a deny access first, and then checks ACEs with a grant access. Deny access always overrides a grant access, if a deny access ACE is reached, then the ACE check ends and access is denied.

If any group a user belongs to denies access, that user will be denied access regardless of any access rights they are granted in their personal user account or other groups they may belong to. Therefore, if the No Access permission is given to the Everyone group, all users will be denied access, including the owner. However, No Access will not prevent the owner from changing permissions on the file and restoring their access.
File and Directory Access Control Inheritance

Before we discuss Access Control Inheritance, it is important to understand that there are two types of objects: container and non-container objects. A container object is an object which logically contains other objects, such as a directory since it logically contains files and other directories. Since a directory is a container object, files would be considered non-container objects since they do not logically contain other Windows NT objects. The distinction between container and non-container objects is used to establish the Access Control Inheritance rules.

When an object is created, one or more ACLs typically need to be assigned to the new object. The ACL inheritance design of Windows NT is intended to allow access control information on a container object to be thought of, and presented to the user, as three separate ACLs:

· Effective ACL

The Effective ACL is the ACL pertaining to the container object.

· Object Inherit ACL

The Object Inherit ACL is the ACL to be inherited by sub-non-container objects.

· Container Inherit ACL

The Container Inherit ACL is the ACL to be inherited by sub-container objects, such as subdirectories.

Therefore, when a new non-container object is created, the parent container's Object Inherit ACL will be applied to the new object, as seen in the diagram below.

When a new sub-container object is created, the parent container's Container Inherit ACL will become both the Effective and Container Inherit ACL for the new sub-container. In addition, the Object Inherit ACL of the parent container will become the Object Inherit ACL of the new sub-container, as seen in the diagram below.

Even though we have been representing and thinking of the ACL information above as three separate ACLs, it is important to remember that this information is really stored as a single ACL. The ACL design allows each ACE in an ACL to be marked for no inheritance, inheritance by sub-containers, non-container objects, or both.

Since security inheritance will be an important issue that may confuse many users of NTFS let us take a look at ACL inheritance under NTFS.
New Files and Directories

· New subdirectories always inherit the security properties of their parent directory.

· New files inherit the security properties of the directory in which they are created.

Copying Files and Directories

· If a directory is copied, it inherits the security properties of its new parent directory.

· If a file is copied, it also inherits the security properties of its new parent directory.

Moving Files and Directories

· If a directory is moved, it currently inherits the security properties of its new parent directory. This is not the documented behavior, it should retain its security properties.

· If a file is moved, it retains its original security properties.

Before Going On

1. What are the minimum attributes that a file must always have? Draw a picture of a Huge File, including the minimum attributes in the diagram.

2. What is the size of each record in the MFT? Can it be determined whether or not a file smaller than xxxx bytes will fit into an MFT record, and then be referred to as a Small file? Is there an estimation number for this size that will typically fit in an MFT record.

3. What are the two methods of attribute storage on an NTFS volume? Do any of the System defined NTFS attributes have to be stored one way or the other?

4. List four of the NTFS System Files and give a brief description of each. Do the NTFS System Files you listed have a constant or dynamic size?

5. Can an HPFS drive be converted to NTFS and retain all of the security that was set on the drive? If so, are there any special utilities that must be used?

6. When accessing a file on an NTFS volume, how are ACEs ordered?

Recoverability
Traditional Disk Write strategies

Careful write

The careful write strategy is designed upon the idea that it is important to keep the volume structure consistent. As such, this strategy uses writes that are carefully ordered, serialized writes, such that crashes will only produce "expected" inconsistencies -- the volume will remain usable. An advantage of this strategy is that the volume will stay clean without frequently needing any sort of volume repair, such as CHKDSK. The disadvantage of this strategy is that serialized writes have a high performance cost. Two examples of careful write file systems are FAT and Digital's ODS 2.
Lazy write

This strategy was designed to speed disk access, by caching data and writing data in the background. Therefore, a user never has to wait for data to be written, before continuing their work. The advantage of the lazy write strategy is that performance is increased through the elimination of serialization and the reduction of the number of write I/Os. The disadvantage of this strategy is that there is possible corruption on crash, necessitating repair that is expensive in terms of time required. An example of a lazy write file system is HPFS, and most UNIX file systems.
NTFS strategy - Recoverable File System

With NTFS, Volume consistency is guaranteed across crashes through transaction logging and recovery techniques. The advantage of this method is that the performance benefits of the Lazy Write method are maintained, with fast crash recovery. The disadvantage of this method is that there is a small overhead for any modifying input/output operation.
NTFS Transactions

A transaction is a collection of smaller actions which must either all occur or all NOT occur. Each modifying I/O request is considered a transaction. A transaction is committed on success and aborted if there is an error.

NTFS logs the following transactions:

· Creation and deletion of files.

· Attribute creation, attribute deletion, and attribute byte-range updates.

· Index creation and index deletion (directories).

· Hot fix records, i.e. new Virtual cluster number to Logical cluster number mappings.

· Periodic Checkpoint records

Every I/O operation that modifies a file on an NTFS volume is viewed by NTFS as transaction and is managed as an atomic unit.

When a file is updated, the Log File Service logs all redo and undo information for that transaction. The logged transaction is then passed to the Cache Manager, which then checks the Memory Manager for free memory resources. If the resources are available, the Cache Manager sends the transaction instructions to NTFS to make the requested changes to the file.

If the transaction completes successfully, the file update is committed. If the transaction is incomplete, NTFS will roll back the transaction by following the instructions in the undo information. If NTFS detects an error in the transaction, the transaction will be ended, and then rolled back using the undo information.
Lazy Commit

A Lazy commit is very similar to a lazy write. Instead of using resources to mark a transaction as successfully completed as soon as it is performed, the commitment information is cached and written to the log as a background process. If the system should crash before the commit has been logged, upon restart, NTFS rechecks the transaction to see whether or not it was successfully completed. If NTFS can not successfully determine if the transaction was successfully completed, NTFS will undo the transaction. Incomplete modifications to the volume are not permitted with NTFS.
Periodic Log File Checkpoints

Every eight seconds, NTFS checks the cache to determine the status of the lazy writer and marks the status as a checkpoint in the log file (see the below section on the Log File Service for more information on checkpoints). If the system should crash following the checkpoint, the system knows to back up to that checkpoint for recovery purposes. This method of checkpoints provides for faster recovery times, by saving the amount of queries that are required during the recovery process.
The Log File Service (LFS)

To understand how the NTFS transaction logging occurs, we must first look at the Windows NT Log File Service (LFS). The Log File Service is the key component used for recoverability under NTFS.

The Log File Service is a component of the Windows NT Executive and is only a helper in the Recoverability process and does not know what it is being used for. The Log File Service is designed to support standard logging and recovery for multiple clients. Currently, only NTFS uses the Log File Service, but is possible for other file systems to take advantage of it.

The Log File Service maintains two objects to perform its functions:
The Restart Area

This is a status area used to transfer information about a client's last checkpoint operation before a crash to the client's recovery procedure. The Log File Restart Area actually points to a Client Restart Area.
The "Infinite" Log File

The Log File is a circularly reused file, thus often referred to as "infinite". When a new record is added to the Log File, it is appended to the end of the file.

When the Log File has reached its capacity (10,485,760 bytes, see the above System File section), the Log File Service waits for writes to take place. As transactions are completed, space will be freed permitting new entries. The Log File is divided into a series of fixed-length pages, with each page containing one or more records. Each page has a small header that contains an Update Sequence Array which is used to detect incorrect transfers of log file pages. In addition, each record in the Log File is assigned a Log Sequence Number (LSN).

Records stored in the Log File contain a standard header that is defined by the Log File Service. A client using the Log File Service can place any information it wants in a Log File record, but NTFS uses only two types of records:

· Checkpoint records

Checkpoint records contain the oldest Log Sequence Number (LSN) that address a log record for an update that has not yet been written to disk. The client using the LFS will periodically write a Checkpoint record to keep track of how long it would take to recover from a crash if one happened at this moment. The client writes the address of one or more checkpoints into its Restart Areas.

· Update records

There are two types of Update records: Undo and Redo. The information provided by an Undo record may be used to reverse an operation which has already been performed. The information provided by a Redo record may be used to apply, or reapply, a given operation. The LFS logs all Redo and Undo information for every transaction. The format of the information in Update records is determined by the client using the LFS.

Update records may be either physical or logical information. Physical information describes updates in terms of specific byte ranges which are to be modified. Logical information expresses updates in terms of a logical operation, such as "delete file README.TXT".

Note The Log File Service maintains two Restart Areas to guarantee that there will always be at least one good Restart Area.

Detecting errors

The Update Sequence Array, which was discussed earlier as part of the Header, is used to ensure multi-sector structures are correctly transferred. The Update Sequence Array is an array of Unsigned Short Integers whose size is equal the number of 512 byte units in the structure that contains it + 1.

The first entry in the Update Sequence Array is always non-zero. This number is incremented each time the structure is written to disk. The remaining entries in the Update Sequence Array capture the last integer value of each 512 byte unit which, in turn, is replaced with the sequence number from the start of the array.

Note In the below examples, each block represents a 512 byte unit. Each number shown is the last number of each 512 byte unit.

Every time a multi-sector structure is read from the hard drive, the last integer of every 512 byte unit is compared with the sequence number at the start of the array. If these two values are not equal, then the previous write failed. If the values are equal, when the data is read, the value at the end of the 512 byte unit is replaced with appropriate value from the Update Sequence Array. If a failure is detected, the recovery will depend on reading the data from a mirrored copy of the data, or if the data is recoverable from the log file.

The above example shows a situation where a write failed. The fourth write failed, since the fourth 512 byte unit ends with a 02. If the fourth write had succeeded, the Update Sequence 4Array should begin with a 04 and each 512 byte unit should end with a 04.
Log File Service Recovery

The Log File Service performs its recovery every time it is called upon to open a Log File that is not already open. LFS recovery involves the following steps:

1. Both LFS restart areas are read to determine which is the most current. If one of the restart areas has an I/O error when read, it is assumed that this was caused by a crash and the other is used as the most current. The one that is determined to be good is then copied onto the bad restart area and verified. If the verification fails, it is assumed that the failure was caused by a bad sector, and the restart area is remapped. If both restart areas are readable, the one with the highest Log Sequence Number (LSN) is used.

2. The end of the circular Log File is then found by reading successive pages and locating the first page whose first Log Record has an LSN lower than that of the first Log Record in the previous page. If an I/O error or Update Sequence Array error is encountered during this, the previous page is assumed to be the end. However, in this case one more page is read to verify that its first LSN is lower, otherwise, there is a fatal error.

At the end of this process, the Log File Service is ready to be used and now knows the exact locations of all of its client's Restart Areas.

NTFS Recovery

NTFS Recovery occurs every time the volume is mounted, at boot time. NTFS Recovery performs the following steps:

1. As soon as the volume is recognized as NTFS, the MFT and the Log File are opened.

2. Once NTFS has opened the Log File, NTFS calls the Log File Service which causes the Log File Service Recovery (described in the previous section) to take place. The Log File Service expects an open Log File to be passed to it by the file system that is calling the Log File Service. This is why NTFS opens the Log File in step 1.

3. NTFS Again calls the Log File Service in order to read its restart area and then to read all of the data from its last checkpoint operation. This data is used to initialize the transaction table, dirty pages table, and open file table necessary for recovery.

4. NTFS then performs an Analysis Pass on its last Checkpoint Record. At the end of this pass, the transaction table should contain only transactions which were active at the time of the crash.

The Analysis pass:

First, NTFS updates the transaction table, dirty pages table and open file table. NTFS then initializes a read context to start at the beginning of its last checkpoint record and to read all records written by NTFS. As it proceeds, NTFS updates its tables as follows:

· If a create delete or update record is found for a Vcn/Lcn pair that is not already in the dirty pages table, then a new entry is made for this page.

· The transaction table is updated to reflect any transaction state changes.

· If attributes are truncated or deleted, then any corresponding pages in the dirty pages table are deleted.

· If files are deleted, they are removed from the open file table.

· If a Hot Fix record is encountered then a lookup is done in the dirty pages table to see if the corresponding Vcn is found, and if so the Lcn in the dirty pages table is overwritten with the new one from the Hot Fix record.

· the end, the Redo LSN is determined.

· This will be the lowest LSN in the dirty pages table.

· This is the LSN at which the Redo Pass begins.

5. NTFS then performs a Redo Pass. At the end of this pass, the cache will reflect the state of the volume at about the time of the crash.

The Redo Pass:

The Redo Pass updates the state of the dirty pages in the cache, to reflect the state at the time of the crash. To do this, NTFS starts by doing a "special" open of files in the open file table. This open is considered "special", because the files are opened from the Vcn/Lcn of their first file record segment.

Next, a read context is created for reading all NTFS log records starting at the Redo LSN. For each of the entries it is decided, based on LSN comparisons, whether or not the entry was successfully applied. Those that were not are reapplied.

6. Finally, NTFS performs an Undo Pass. At the end of this pass, the volume is fully recovered.

The Undo Pass:

The Undo Pass is used to roll back any transactions that did not complete before the crash occurred. At the end of the Analysis Pass, the transaction table should contain only transactions which were active at the time of the crash. For each active transaction, a read context is created to read all of its log records backwards, as they are linked by the UndoNextLsn. As each individual update is undone, a Compensating Log Record is written to the Log, and the UndoNextLsn field of the transaction table is updated.

Compensating Log Record

As was seen above, during recovery as each update is undone, a Compensating Log Record is written to the log. This is done so that if the system were to crash during a rollback, the rollback would not have to be rolled back on the next restart.

A Compensating Log Record does not contain any undo information, since it does not have to be undone. The undo link of the Compensating Log Record is really the undo link of the record being rolled back. Therefore, if a Compensating Log Record is written to the log file, there is actually one less record to be rolled back.

In the example below, as soon as Action3 has been completed, Compensating Log Record 1 (CLR 1) is written to the Log File. If the system were to crash at this time, only Action1 and Action2 would need to be undone. Since there is a Compensating Log Record for Action3, the recovery process realizes that Action3 has already been undone, and therefore will only undo Action1 and Action2.

NTFS CHKDSK

On an NTFS partition, the Windows NT CHKDSK utility performs the following steps:
Determine the location of the data attribute of the Master File Table

· Determine the location of the Base File Record Segment (BFRS) of the Master File Table (MFT), which is pointed to by the boot sector.

There are two copies of the boot sector on the disk. If both copies are unreadable, CHKDSK cannot proceed. If the BFRS of the MFT is unreadable, then the mirror of the MFT is used. Again, if both of the BFRS are unreadable, CHKDSK cannot proceed.

· Once the Base File Record Segment (BFRS) has been found, it is checked.

It must be marked in use and as a BFRS. In addition, the Attribute Records and Update Sequence Array must not overlap in any way.

· All the non-resident attribute records in the BFRS are scanned to find the default data attribute.

If there are any problems or there are multiple data attributes, CHKDSK cannot proceed.

· The mapping pairs are checked.

Once again, if there are any problems CHKDSK cannot proceed.

· Finally, the BFRS of the MFT is searched for an attribute_list attribute and it is checked.

If this is found and read without any problems, the attribute_list is searched for a data attribute with a lowest Virtual Cluster Number (Vcn) equal to one greater than the highest Vcn of the previously located data attribute.

If no such attribute is found, then the data attribute of the MFT has been established. Otherwise, the file record segment of the next portion of the Data attribute has been determined.

This step of CHKDSK continues until the complete data attribute of the MFT has been established.
Establish the Attribute Definition Table

The Attribute Definition Table is located in the default data attribute of the Attribute Definition Table file. The process of determining the location of the data attribute of the Attribute Definition Table is identical to the process used in step 1 to determine the data attribute of the MFT.

If the Attribute Definition Table file does not exist or is unreadable, a default Attribute Definition Table, as would be generated by formatting a drive with NTFS, is used.

Note If the default Attribute Definition Table is used, any user defined attributes will not be defined in the table.

Once the Attribute Definition Table is established, it is set aside for future use.
Validate all of the file record segments in the MFT

Every file on the volume is now validated. To accomplish this, an empty volume bitmap, MFT bitmap, and list of unreadable clusters are created. Then, all of the file record segments (FRS) are stepped through as follows:

1. Read the next file record segment (FRS), and if there are no more jump to step 12.

2. If the FRS is unreadable, it is added to the bad cluster list and the process is started over at step 1.

3. If the FRS is not in use, start over at step 1.

4. If the FRS is not a base file record segment, start over at step 1.

5. The MFT bitmap for this FRS is marked.

6. The Base File Record Segment (BFRS) is validated. In addition, the attribute records are validated with the Attribute Definition Table and the volume bitmap is updated as attribute records are validated.

7. The attribute_list is validated as a syntactic structure.

8. All child record segments are validated in the same manner as step 6 and then marked in the MFT bitmap.

9. The contents of the attribute_list are reconciled with the set of attribute records in the bas and child file record segments.

10. The multiple attribute record attributes are ensured to be consistent.

11. Start over at step 1.

12. Finally, the in use bit is cleared on any orphan file record segments.

Executes the list of outstanding actions

Any outstanding instructions in the log file are now executed.
Validates the volume's special attributes

The MFT's mft_bitmap attribute must be identical to the MFT bitmap generated during the validation of the FRSs.

The first three file record segments (FRS) in the data attribute of the MFT and the MFTmirror file must be identical and located on readable sectors.

The data attribute of the Attribute Definition Table file must contain a valid Attribute Definition Table.

The data attribute of the Bitmap file must be identical to the volume bitmap generated during the validation of the FRS step of CHKDSK.

The volume's two boot sectors must match if they are both readable.

The data attribute of the boot file must contain the volume's two boot sectors.

The data attribute of the bad cluster file must contain all of the unreadable clusters that were discovered during the CHKDSK algorithm.

Cache Manager

Under Windows NT, caching is an integral part of the operating system, called the Cache Manager. The Windows NT Cache Manager is a self tuning cache, with no user configurable parameters, that can, and does, use all available free memory. The Cache Manager does write behind caching on all three currently supported file systems (FAT, HPFS, NTFS), and can be used by all processes without any special considerations.

As was mentioned earlier, the Cache Manager is given all the memory in the system which is not being used by running processes. As an aside, this is why the Program Manger, Help About dialog box under Windows NT shows the amount of physical RAM on a system. In the early stages of the development of Windows NT, Help About displayed the amount of available memory, as it does under Windows 3.x. This behavior caused many people to wonder what was using all of their system's memory, when it was the Cache Manager using the memory.

The Cache Manager has a minimum size of 512 K (not adjustable) that it will shrink to. The cache size will be shrunk to this minimum if more memory is needed by processes. Processes yield memory to other processes, including the Cache Manager, when there is memory pressure. Therefore, memory will always be allocated to those processes using and/or needing memory.
Cache Manager and Recoverability

In order for the Cache Manager to support Recoverability, it must be able to do the following things:

· Disable read ahead.

· Disable write behind.

· Enable Update Sequence Array protection.

· "Remember" Log Sequence Numbers for the Log File Service's Log File pages.

Before Going On

1. List three of the five transactions that are logged by NTFS. What two kinds of information are logged for each of transaction under NTFS?

2. Draw a picture, showing the Windows NT components involved when a transaction occurs on an NTFS volume.

3. Where does the Log File Service reside? Based on this, is it possible for other clients to use the Log File Service?

4. What is the entry that the Log File Service maintains in the Log File called? What are the two types of entries NTFS places in the Log File?

5. Where in an NTFS file does the Update Sequence Array reside? Briefly explain how the Update Sequence Array is used to detect that a write was not completed.

6. What are the three passes that are made during NTFS recovery called?

7. What are the minimum and maximum sizes for the Windows NT Cache Manager? Are there any user configurable parameters for the cache?

To Learn More

See the Windows NT Resource Guide, Chapter 4 - New Technology File System.