Creating a Path Policy

A path policy deals with the mapping between paths and clusters. After a path policy is created, all operation requests on a path are forwarded by the gateway cluster to a specified cluster for execution.

Prerequisites

Context

Procedure

  1. Choose Resources > Resources > Namespace.
  2. Select a desired account from the Account drop-down list in the upper left corner.
  3. Click a desired namespace name to go to the details page. Choose SmartTakeover > Path Policies.
  4. Click Create.

    The Create Path Policy page is displayed on the right.

  5. Set the path policy. Table 1 describes related parameters.

    Table 1 Path policy parameters

    Parameter

    Description

    Path

    HDFS path to be forwarded. This parameter must be in the absolute path format. The last-level directory of the absolute path must be in the ERE syntax regular expression format of POSIX. For example, if all subdirectories and files in the /home directory need to be written to a specified cluster, set this parameter to /home/.*.

    Cluster Name

    Target cluster for forwarding a path, that is, the managed cluster that is mapped to the path.

    NOTE:

    This parameter is displayed only when the SmartTakeover mode is set to Gateway federation or Local federation.

    Balancing Policy

    Balancing policy for a path. When a balancing policy is configured for a path, the balancing policy provides an approach for selecting a cluster to create files when the path is mounted to multiple clusters. That is, a cluster is selected from multiple clusters based on a certain algorithm. For details, see Table 2.

    NOTE:

    This parameter is displayed only when the SmartTakeover mode is set to Gateway federation or Local federation.

    Balancing Factor

    A balancing factor can be configured to attempt to write more files to a cluster with a large remaining capacity at a certain probability.

    NOTE:
    • This parameter needs to be set only when the SmartTakeover mode is set to Gateway federation or Local federation and Balancing Policy is set to SPACE.
    • The value range is from 0 to 100. The default value is 60.

    Fault Tolerance

    Whether to switch services to other normal clusters in the cluster list if one or more clusters in the cluster list are faulty.

    NOTE:

    This parameter is displayed only when the SmartTakeover mode is set to Gateway federation or Local federation.

    Table 2 Balancing policy parameters

    Balancing Policy

    Description

    HASH

    When multiple clusters are configured in a path policy, the path name of a level-1 subdirectory of the path in the path policy determines the cluster for distributing the files in the level-1 subdirectory; all sub-files in the same level-1 subdirectory are distributed in the same cluster. The path name of a level-1 sub-file of the path in the path policy determines the cluster for distributing the file.

    This mode applies when files with the same parent directory are distributed in the same cluster.

    Examples:

    When the path policy is as follows:

    path=/dir01/.*

    cluster_name_list=cluster1,cluster2,cluster3

    order=HASH

    According to the hash algorithm, all files in the /dir01/dir02/dir_test directory and files in its subdirectories are created in the storage cluster cluster1.

    According to the hash algorithm, all files in the /dir01/dir02/dir03 directory and files in its subdirectories are created in the storage cluster cluster3.

    The /dir01/testfile file is created in cluster2 based on the algorithm.

    HASH_ALL

    When multiple clusters are configured in a path policy, files are distributed to clusters by paths. Files with similar full paths are distributed to a fixed cluster.

    Note that when a large number of files with similar full paths are created in the same cluster, performance may deteriorate.

    This mode applies when files with regular paths are distributed in a centralized manner.

    Examples:

    When the path policy is as follows:

    path=/dir01/.*

    cluster_name_list=cluster1,cluster2,cluster3

    order=HASH_ALL

    According to the hash algorithm, among the 10,000 files from /dir01/dir02/dir03/a1 to /dir01/dir02/dir03/a10000, most files have highly similar full paths and therefore, are created in the storage cluster cluster3.

    Files with dissimilar full paths are evenly distributed to the three clusters.

    RANDOM

    When multiple clusters are configured in a path policy, a gateway cluster randomly and evenly writes files to the clusters.

    This mode applies when files are distributed to the clusters evenly in terms of file quantity.

    Examples:

    When the path policy is as follows:

    path=/dir01/.*

    cluster_name_list=cluster1,cluster2,cluster3

    order=RANDOM

    The 10,000 files from /dir01/dir02/dir03/a1 to /dir01/dir02/dir03/a10000 are evenly distributed to the three clusters.

    SPACE

    When multiple clusters are configured in the path policy, the gateway cluster attempts to write more files to the cluster with larger remaining capacity based on a certain probability (the balancing factor can be configured and the value ranges from 0 to 100, indicating the percentage of the probability).

    If the balancing factor is set to 100, files are written to the cluster with the largest remaining capacity.

    If the balancing factor is set to 0, files are written to the cluster with the smallest remaining capacity.

    If the balance factor is from 0 to 100, the balancing factor is used as a probability and files are written to the cluster with the largest remaining capacity.

    This mode applies to scenarios where cluster capacity is balanced.

    Note that when a large balance factor is set, the gateway cluster sends most services to the cluster with the largest capacity. As a result, the cluster may be overloaded and services may be negatively affected.

    When the path policy is as follows:

    path=/dir01/.*

    cluster_name_list=cluster1,cluster2,cluster3

    order=SPACE

    balanced_factor=60

    About 60% of the 10,000 files from /dir01/dir02/dir03/a1 to /dir01/dir02/dir03/a10000 are created in cluster1 with the largest remaining capacity. Further, 60% of the rest of the files are created in cluster3 with the second largest remaining capacity. At last, the remaining files are created in cluster2.

    WRITE_FIRST

    When multiple clusters are configured in a path policy, a gateway cluster always writes files to the first cluster in the cluster list of the path policy.

    This mode applies when multi-read and single-write are required and files are written to a fixed cluster.

    When the path policy is as follows:

    path=/dir01/.*

    cluster_name_list=cluster1,cluster2,cluster3

    order=WRITE_FIRST

    All files in the /dir01 directory are created in cluster1 that ranks first in cluster_name_list.

  6. Click OK.