Availability and Scalability
Ericsson Service-Aware Policy Controller

Contents

1Availability and Scalability Introduction

2

Availability and Scalability Function
2.1Availability
2.2Scalability

3

Availability and Scalability Operational Conditions
3.1Availability and Scalability External Conditions
3.2Availability and Scalability Function Administration
3.3Availability and Scalability Security

Reference List

Abstract

This document provides a description of Availability and Scalability function provided by Ericsson Service-Aware Policy Controller (SAPC).


1   Availability and Scalability Introduction

This document provides a description of Availability and Scalability function provided by Ericsson Service-Aware Policy Controller (SAPC).

2   Availability and Scalability Function

2.1   Availability

The SAPC is built on a high available architecture where a single failure does not stop the operation of the cluster. It is built over a cluster of Virtual Machines of three types:

Figure 1   SAPC Cluster architecture.

The two SCs provide the OAM and provisioning services in an active-standby mode, which means that if an SC goes down, all services considering it as the active one, are managed by the other SC. The rest of Virtual Machine types work in an active-active mode. The incoming traffic is distributed by a maximum of six TPs (usually, the first six TPs) among all the available traffic processors in the cluster. These TPs are also the ones publishing the traffic virtual IP address to the external network. If one of these TPs goes down, the publishing of the virtual IP address and the traffic distribution functions are moved to another available TP. Also, this TP is not considered to receive traffic in the distribution until it is up again.

The following situations, in which multiple failures were produced simultaneously, would affect on the SAPC service availability:

To increase the reliability and availability of the system, the SAPC includes several control mechanisms, as restoration procedures, overload control, session clean up procedure, and mechanisms to overcome connectivity loss.

2.1.1   Restart and Restore Procedures

The SAPC provides mechanisms to handle restart situations both for the SAPC itself and for the peer traffic plane nodes, and also procedures to restore.

2.1.1.1   SAPC Restart

Even when the SAPC provides a high level of availability, in case both SCs fail simultaneously during more than 15 minutes, the SAPC is restarted. Once the SAPC recovers from a restart, the last database information is recovered from stored backups. The information recovered may not be fully up to date and for this reason some actions are performed by the SAPC to consolidate this information.

Next sections describe the actions performed by the SAPC , in PCC deployment scenario, after a cluster restart.

The SAPC increments its own Origin-State-Id and includes the new value in every response message alerting the peer nodes about the loss of previous session state.

Note:  
The Origin-State-Id is a monotonically increasing value that is advanced whenever a Diameter entity restarts with loss of previous state.

The sessions stored before the restart, are not recovered. And therefore all dynamic data related to sessions is neither recovered:

Gx, Sy and Rx sessions are identified by the Diameter session id. A session is considered unknown if the SAPC does not find in its internal database a session with the same session_id. After a SAPC restart, requests sent from PCEFs, AFs or Online Charging Systems for an unknown session, will be answered by the SAPC with the DIAMETER_UNKNOWN_SESSION_ID error code.

All subscriber-related data is recovered from the stored backups.

2.1.1.2   Peer Restart

The SAPC is able to detect diameter peer nodes restarts based on the standard mechanism described for Diameter nodes in RFC 6733 (refer to Diameter Base Protocol, IETF RFC 6733).

The SAPC provides the following mechanisms to handle restart situations for the peer traffic plane nodes.

2.1.1.3   SAPC Restore

The SAPC provides the System Data type of restore.

System Data backup is used to do a system data fallback to recover to a former version of the whole system with consistency.
After restoring a System Data backup, the SAPC reestablishes the following information:

And the SAPC loses the following data:

2.1.2   Session Cleanup Mechanisms

The following mechanisms are implemented in the SAPC to remove the obsolete information.

2.1.2.1   Basic Session Cleanup Mechanism

Following mechanism is related with the removal of specific obsolete sessions:

2.1.2.2   Massive Cleanup Mechanism

Massive Gx Session Clean up at PCEF Restart

This clean up mechanism consists of deleting all the obsolete IP-CAN sessions existing in the SAPC for a restarted PCEF considering also:

Massive Gx Session Clean up at PCEF Peer Removal

When a diameterNode peer is removed from the configuration data, the SAPC removes all the IP-CAN sessions established by that peer, using the same PCEF restart mechanism.

Massive Rx Session Clean up at AF Restart

This clean up mechanism consists of deleting all the obsolete AF sessions existing in the SAPC for a restarted AF considering also:

Note:  
The SAPC provides a robust mechanism that allows to clean the obsolete sessions even in case of geored switchover or scaling scenarios.

Both massive Gx and Rx clean up processes continue scanning and removing sessions until all the obsolete IP-CAN or AF sessions of the restarted peer have been removed.


2.1.2.3   Session Inactivity Cleanup Mechanism

This clean up mechanism consists of deleting all the inactive Gx sessions (no request is received/sent for them in a configurable period of time) existing in the SAPC considering also:

This mechanism is daily and enabled/disabled by configuration together with other parameters, as explained in Configure Session Inactivity Cleanup Mechanism.

In case there is a massive clean up running or detected at the same time with a session inactivity cleanup process, the SAPC stops the session inactivity cleanup process.

2.2   Scalability

The SAPC is built on a scalable architecture providing the ability to, on runtime, increase the capacity for traffic processing adding additional processors (Scale-out) or reduce the capacity removing existing processors (Scale-in). SAPC is able to keep performance levels (few seconds impact on the ongoing traffic) when Scale-out or Scale-in functions are performed. VM types to be scaled are only the traffic processors (TPs).

Next figure shows a SAPC cluster initially installed with m TP VMs, that have been scaled-out up to z TP VMs.

Figure 2   SAPC Cluster where new TP VMs have been added

When TPs are scaled, the traffic interface and traffic distribution functionalities are also included, up to six running instances in six different traffic processors. From this number onwards, the new scaled TPs provide these functions in a spare way (as standby to became ready if any other active instance gets down).

2.2.1   Multi-site Support

The SAPC supports geographical distribution (multi-site) configurations when a single SAPC does not have enough capacity to handle all the subscribers traffic in the following scenarios.

2.2.1.1   SAPC with Common Database

In this deployment, the operator has multiple SAPCs deployed in different sites and a common database to store the subscriber data. Hence, any SAPC can serve IP-CAN session from any subscriber. Fair Usage Accumulators must be stored in the common database, so that any of the SAPCs can access and modify the data at any time

Figure 3   Multi-Site Deployment with Common Database

Subscribers static data and Fair Usage Accumulators are centralized in the external database. The rest of static data as Subscriber Groups, Services, and so on, together with operator defined policies, are provisioned in all the SAPCs.

2.2.1.2   Network Dependencies

The following considerations must be taken into account in deployments with multiple SAPCs.

3   Availability and Scalability Operational Conditions

3.1   Availability and Scalability External Conditions

Notice that VIP Gateway routers are not part of a SAPC but are needed in whatever deployment of a SAPC.

3.2   Availability and Scalability Function Administration

The following sections list the relevant Operation and Maintenance related actions, alarms, logs, notifications, and statistics data related to the function.

3.2.1   Availability and Scalability Alarms

There are no specific SAPC alarms related to its availability, apart from the ones provided by the platform:

3.2.2   Availability and Scalability Logging

The following events are logged:

3.2.3   Availability and Scalability Notifications

There are no specific SAPC notifications related to service availability, apart from the ones provided by the platform.

3.3   Availability and Scalability Security

Not applicable.


Reference List

Ericsson Documents
[1] Subscription and Policy Management.
[2] Overload Control.
Standards
[3] Diameter Base Protocol, IETF RFC 6733.