User Guide 6/1553-AXM10104/1 Uen AB
[ Topics hidden with current filter selection ]

vMRF Troubleshooting Guideline
Virtual Multimedia Resource Function

Contents


1 Introduction

This document describes how to perform troubleshooting procedures in the vMRF.

1.1 Prerequisites

Before starting the procedures described in this document, ensure that the following documents have been read:

Certain troubleshooting activities can have an impact on the node performance. For example, trace or log activation can be disturbing traffic and is not recommended without first consulting next level of maintenance support.

2 Troubleshooting Procedures

Problems identified that cannot be solved by using this document must be reported to the next level of maintenance support. This is to result in a Customer Service Report (CSR).

The details of the trouble reporting process is outside the scope of this document.

A manual recovery flow in Perform Manual Recovery presents a generic workflow to identify and solve problems if possible, or to collect useful data.

Trouble Cases describes specific trouble cases for various scenarios. These cases often utilize the manual recovery flow as well.

Log files of access and authorization events in the system can be collected by using the journalctl command. For more information, see vMRF Security Management.

It is recommended to periodically export and store the configuration of the node outside the VNF as a backup for a possible re-deployment. For more information, see vMRF Backup and Restore Guideline.

2.1 Perform Manual Recovery

  1. Log in to the vMRF instance.
  2. Collect troubleshooting data regarding the incident in case next level support needs to be contacted.

    For more information on how to collect information, see Data Collection Guideline for vMRF.

  3. Check vMRF status, identify the faulty VM if needed.

    See vMRF Status Check for the procedure.

  4. Consider performing a backup.

    For more information, see vMRF Backup and Restore Guideline.

  5. Lock the faulty VM. If the VM is not available or accessible, continue with Step 7.
    Note: In this case, the VM is locked immediately and all ongoing traffic from the VM is lost. For more information on locking a VM, see vMRF Configuration Management.
  6. Restart the VM from the cloud management tool.
    Note: If the MrfInstance MO that represents the VM is in LOCKED state after the restart, it must be unlocked by setting the administrativeState attribute of the MO to UNLOCKED.
  7. If the problem is not solved, remove the VM from the cluster using the cloud management tool and create a new VM.

2.2 Common Procedures

This section describes procedures used during troubleshooting.

2.2.1 vMRF Status Check

This procedure describes how to verify the vMRF deployment.

The status check shows the status of all VMs in the cluster. A VM can be identified by its UUID visible in the following locations:

2.2.1.1 Check vMRF Status on OpenStack

  1. Open an SSH connection to the O&M IP address of the vMRF VNF instance using the following command:
    ssh -A -i <private_key_.pem_file> <user_ID>@<O&M_IP_address>
  2. Run the following command:
    cluster run verify_vmrf_node_status.py
  3. Check that all components are in the OK state.

    The following example shows the printout of a successful status check:

    Running command: "verify_vmrf_node_status.py" on localhost
                    eth0: OK
                    eth1: OK
                    eth2: OK
                 SC role: ACTIVE
                  CoreMW: OK
                     COM: OK, RUNNING
             MrfDirector: OK, RUNNING
               CliDaemon: OK
              IpPipeline: OK
                  TC-MPD: OK
                MrfAgent: OK
               CloudInit: OK
                SEC-CERT: OK
      neighbourdetection: OK
    

3 Trouble Cases

This section describes trouble cases for vMRF.

Follow the troubleshooting workflow as shown in Figure 1.

Deployment Related Trouble Cases Scaling Related Trouble Cases Signaling Related Trouble Cases Media Related Trouble Cases Load Related Trouble Cases Announcement Related Trouble Cases
Figure 1   Troubleshooting Workflow

3.1 Deployment Related Trouble Cases

3.1.1 VNF Does not Start

3.1.1.1 Image Extraction Problem

There are problems during image extraction.

Cause

Image is corrupted.

Solution

  1. Verify the image using the MD5 checksum file that is provided by Ericsson.
  2. If needed, repeat the download and extraction of the vMRF package as described in the corresponding deployment guide.
  3. If the problem still exists, contact Ericsson support.

3.1.1.2 Instantiation not Possible in OpenStack

Heat stack creation failed due to error: No valid host found. There are not enough hosts available.

Cause

  • Low vCPU resources

  • Low memory

  • Problem in network allocation. Compute nova log shows: Failed to allocate the network(s)

Solution

  1. In case of network allocation problem, check the neutron logs and configuration. Correct the network configuration.
  2. Ensure that the environment fulfills hardware, software, and network requirements. The main requirements are listed in vMRF Infrastructure Requirements.
  3. Follow the deployment instruction described in Deployment Guide for OpenStack.
  4. If the problem still exists, contact Ericsson support.

3.1.2 Cyclic Kernel Restart

Cyclic kernel restart.

Solution

  1. Perform the manual recovery flow procedure.
  2. If the problem still exists, contact Ericsson support.

3.1.3 VM Stuck

A VM is inaccessible, or the VM console shows problems.

Diagnostics

  • ssh connection to VM is not possible:

    ssh mrsv-admin@192.160.112.15
    ssh: connect to host 192.160.112.15 port 22: No route to host
    
  • vMRF is disabled in vMTAS:

    >show  ManagedElement=1,MtasFunction=MtasFunction,⇒
    MtasMediaFramework=0,MtasMrf=0,MtasMpController=0,MtasMrfpNode=1
    MtasMrfpNode=1
       mtasMrfpNodeAdministrativeState=UNLOCKED
       mtasMrfpNodeMId="[10.52.58.222]:2944"
       mtasMrfpNodeOperationalState=DISABLED
    
  • VM console is inaccessible or console shows problems

  1. Perform the manual recovery flow procedure.
  2. If the problem still exists, contact Ericsson support.

3.1.4 No Console Connection in OpenStack

Console is not available.

Solution

  1. Restart OpenStack nova services.

3.1.5 No ssh Connection into VM

3.1.5.1 No ssh Connection into VM in OpenStack

No route to host.

ssh mrsv-admin@192.160.112.15
ssh: connect to host 192.160.112.15 port 22: No route to host

Cause

  • No IP available

  • Floating IP problem

  • Routing problem

  • Security group problem

Solution

  1. Check console connection.
  2. Check if there is an IP address for eth1.
  3. Check if the floating IP is associated with the VM.
  4. Check if ssh is enabled in the security groups of the VM.
  5. Check connectivity and routing firewalls towards the cloud environment.
  6. Perform the manual recovery flow procedure.
  7. If the problem still exists, contact Ericsson support.

3.1.6 Wrong cloud-init Syntax

The cloud-init process indicates problem.

Solution

  1. Check the user-data.txt file delivered in the release package, and modify it if needed.
  2. If the problem still exists, contact Ericsson support.

3.1.7 No Running MRF Processes on SC VM

There are MRF processes on the SC VM in not running state.

Status check shows following printout:

Running command: "verify_vmrf_node_status.py" on localhost
                eth0: OK
                eth1: OK
                eth2: OK
             SC role: not_available
              CoreMW: ERROR
                 COM: OK, NOT RUNNING
         MrfDirector: OK, NOT RUNNING
           CliDaemon: OK
          IpPipeline: OK
              TC-MPD: OK
            MrfAgent: ERROR
           CloudInit: NOT RUN YET
            SEC-CERT: OK
  neighbourdetection: OK

Solution

  1. Perform the manual recovery flow procedure.
  2. If the problem still exists, contact Ericsson support.

3.1.8 VNF Restarts Unexpectedly

The VNF restarts unexpectedly.

Cause

VNF restart can be initiated by the watchdog device that monitors vital parameters of each VM instance.

Solution

In the case of watchdog-initiated VNF restart, normally, no actions are necessary. If the unexpected VNF restart is detected repeatedly, continue with this procedure:

  1. Collect troubleshooting data regarding the incident in case next level support needs to be contacted.

    For more information on how to collect information, see Data Collection Guideline for vMRF.

  2. Consult next level of maintenance support. Further actions are outside the scope of this instruction.

3.2 Scaling Related Trouble Cases

3.2.1 vMRF VM Joins Wrong Cluster

A vMRF VM joins a different network cluster.

journalctl shows the following printout:

mrsv-admin@42198368-bedd-898f-0d15-533ee8ad7dc4:~$ sudo journalctl | grep tipc
Jan  9 00:42:18 kontron-am4024e kernel: [   36.890082] tipc: Started in network mode
Jan  9 00:42:18 kontron-am4024e kernel: [   36.891668] tipc: Own node address <1.1.3>, network identity 4711
Jan  9 00:42:18 kontron-am4024e kernel: [   36.893952] tipc: Enabled bearer <eth:et<eth0>, ⇒
discovery domain <1.1.0>, priority 10
Jan  9 00:42:18 kontron-am4024e kernel: [   36.897451] tipc: Established link <1.1.3:eth0-1.1.1:eth0> ⇒
on network plane A
Jan  9 00:42:18 kontron-am4024e kernel: [   36.899559] tipc: Established link <1.1.3:eth0-1.1.2:eth0> ⇒
on network plane A

1.1.3:eth0-1.1.1:eth0 implies problem. Correct tipc connection in vMRF is from eth0 to eth0: <1.1.10:eth0-1.1.15:eth0>

Cause

  • Problem in cloud networking

  • Incorrect network configuration

  • Open network

Solution

  1. Check and reconfigure cloud networking.

3.3 Signaling Related Trouble Cases

3.3.1 No Connection to NextHop

SCTP operational state of the MRF application is disabled.

Diagnostics

The mrf_appl status command shows the following printout:

mrsv-admin@fi2-vmrf-com-uplift-cl2:~$ cluster run cli_tool mrf_appl status
Running command: "cli_tool mrf_appl status" on host: 192.168.0.3 (fi2-vmrf-com-uplift-cl2)
[2017-01-09 11:50:55.383]

Signalling State:
===================

H248Interface-Id: 3 H248Interface-LDN: "MediaResourceFunction=1,MrfH248Control=1,MrfH248Interface=BLR2_16_2_mrf4" H248Interface association state: UNLOCKED
H248Interface Service Change state: NOT_STARTED
Sctp operational state: DISABLED
Remote IP Address: 10.52.60.8 Remote Port: 21614
===================

H248Interface-Id: 2 H248Interface-LDN: "MediaResourceFunction=1,MrfH248Control=1,MrfH248Interface=BLR2_16_2_mrf3" H248Interface association state: UNLOCKED
H248Interface Service Change state: NOT_STARTED
Sctp operational state: DISABLED
Remote IP Address: 10.52.60.8 Remote Port: 21613
===================

H248Interface-Id: 1 H248Interface-LDN: "MediaResourceFunction=1,MrfH248Control=1,MrfH248Interface=BLR2_16_2_mrf2" H248Interface association state: UNLOCKED
H248Interface Service Change state: COMPLETED
Sctp operational state: ENABLED
Remote IP Address: 10.52.60.8 Remote Port: 21612
===================

LocalEndpoint Id: 3
Dscp: 40
Local port: 2944
===================

Sctp socket state: INITIATED.
IP: 10.52.61.219
===================

MRF instance administrative state: UNLOCKED
===================

Running command: "cli_tool mrf_appl status" on host: 192.168.0.4 (fi2-vmrf-com-uplift-cl2-0)
[2017-01-09 11:50:54.517]

Signalling State:
===================

H248Interface-Id: 3 H248Interface-LDN: "MediaResourceFunction=1,MrfH248Control=1,MrfH248Interface=BLR2_16_2_mrf4" H248Interface association state: UNLOCKED
H248Interface Service Change state: NOT_STARTED
Sctp operational state: DISABLED
Remote IP Address: 10.52.60.8 Remote Port: 21614
===================

H248Interface-Id: 2 H248Interface-LDN: "MediaResourceFunction=1,MrfH248Control=1,MrfH248Interface=BLR2_16_2_mrf3" H248Interface association state: UNLOCKED
H248Interface Service Change state: NOT_STARTED
Sctp operational state: DISABLED
Remote IP Address: 10.52.60.8 Remote Port: 21613
===================

H248Interface-Id: 1 H248Interface-LDN: "MediaResourceFunction=1,MrfH248Control=1,MrfH248Interface=BLR2_16_2_mrf2" H248Interface association state: UNLOCKED
H248Interface Service Change state: ONGOING_COLD_BOOT
Sctp operational state: DISABLED
Remote IP Address: 10.52.60.8 Remote Port: 21612
===================

LocalEndpoint Id: 4
Dscp: 40
Local port: 2944
===================

Sctp socket state: INITIATED.
IP: 10.52.61.215
===================

MRF instance administrative state: UNLOCKED
===================

Cause

  • VLAN tagging problem in cloud

  • VLAN tagging problem in site switches or routers

  • Physical connectivity problem

  • Security group problem

Solution

  1. Check if the MRF H.248 Link Unavailable alarm is active. Clear the alarm using the alarm instruction.
  2. Check if the VM has IP address for eth2.
  3. Check that SCTP is enabled in the security groups of the VM.
  4. Check connectivity and routing firewalls towards the cloud environment.
  5. Restart the mrf process from CLI:
    sudo systemctl restart mrf_appl.service
  6. Perform the manual recovery flow procedure.
  7. If the problem still exists, contact Ericsson support.

3.3.2 No Connection to O&M IP Address

The remote host key for the SSH connection to the O&M IP address has changed.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
<fingerprint>.
Please contact your system administrator.
Add correct host key in ~/.ssh/known_hosts to get rid of this message.
Offending key in ~/.ssh/known_hosts: <line number of the offending key>
Permission denied (publickey,password).

Cause

  • After manual VNF upgrade, a new VM takes the ACTIVE SC role but has a different remote host SSH key than the old VM.

Solution

  1. Remove the offending key from ~/.ssh/known_hosts using the following command:
    sed -i '<line_number_of_the_offending_key>d' ~/.ssh/known_hosts

    Result: The offending key is deleted, the warning message is not be displayed as long as the ACTIVE SC role stays in the current VNF.

3.4 Media Related Trouble Cases

3.4.1 No Connection to Client

Client is not reachable. Ping request from the IP address ends in timeout.

The ipp conf command shows the following printout:

mrsv-admin@fi1-vmrf:~$ cli_tool ipp conf
Configuration:
Network (id:1)               default_network
    VLAN ID                  -
    UDP Port Range           1024..65535
    Media IP IF (id:1)
        Ethdev               em1 (id:0)
        MAC                  FA:16:EE:FC:14:7A
        Link                 UP
        IP                   10.52.58.133
        Status               STATIC
    Media IP IF (id:2)
        Ethdev               em1 (id:0)
        MAC                  FA:16:EE:FC:14:7A
        Link                 UP
        IP                   2001:1b70:8298:2038::5
        Status               STATIC
        Link local           fe80::f816:eeff:fefc:147a
    Static Route (id:4)
        IP                   0.0.0.0/0
        Nexthop (id:4)
            MAC              00:30:88:11:DB:83
            IP               10.52.58.129
    Static Route (id:6)
        IP                   ::/0
        Nexthop (id:6)
            MAC              00:30:88:11:DB:83
            IP               fe80::230:88ff:fe11:db83
mrsv-admin@fi1-vmrf:~$

mrsv-admin@fi1-vmrf:~$ cli_tool ipp ping -m 1 10.52.45.129
PING 10.52.45.129 56 bytes of data
Timeout (3000 ms)
mrsv-admin@fi1-vmrf:~$

Cause

  • Problem in static route

  • Problem in client

Solution

  1. Check static route in vMRF.
  2. Check routing on site.
  3. Check client using Wireshark.
  4. Check that connection is working normally.
    mrsv-admin@fi1-vmrf:~$ cli_tool ipp ping -m 1 10.52.45.129
    PING 10.52.45.129 56 bytes of data
    56 bytes from 10.52.45.129: icmp_seq=0 ttl=62 time=0 ms
    mrsv-admin@fi1-vmrf:~$

3.5 Load Related Trouble Cases

3.5.1 Disturbances in Traffic

Temporary or permanent stoppage of traffic.

Cause

  • Software problem

Solution

  1. Check crash dumps.

  1. Collect related data and contact Ericsson support.

    For more information on how to collect information, see Data Collection Guideline for vMRF.

3.5.2 Speech Quality Problem

3.5.2.1 Bandwidth Limitation

Bad speech quality.

Diagnostics

Bandwidth limitation can be checked by using the ipp discard-counters command:

mrsv-admin@fi8-mrs:~$ cluster run cli_tool ipp discard-counters
RX_BANDWIDTH_POLICING_DROP_TRAFFIC                      : 6286 

Cause

  • Bandwidth limitation

Solution

  1. Check that needed bandwidth is not limited.

3.5.2.2 Packet Loss in vSwitch

Bad speech quality.

Cause

  • Packet loss in vSwitch.

Solution

  1. Check packet loss in vSwitch by using the ipp internals command:
    cluster run cli_tool ipp internals -l 1
     portname dir   max burst       total    discards        lost
          em1 out          64  3168760208           0      300455
    vswitch lost: 390124
    
  2. Check packet loss in vSwitch by using the cloud management tool.
  3. Add more VMs to the cluster, if needed.

3.5.2.3 Packet Loss on Site

Bad speech quality.

Cause

  • Packet loss on site.

  • Packet loss, error counters, or both are incremented in site switches or routers and cloud server switches.

Solution

  1. Check connectivity, configuration, VLANs, and routing firewalls towards cloud environment.

3.6 Announcement Related Trouble Cases

3.6.1 vMRF Cannot Play Announcement

vMRF cannot play announcement.

The h248-counters command shows the following printout:

mrsv-admin@fi2-vmrf-20170116-084636-cl1:~$ cli_tool mrf_appl h248-counters
[2017-01-17 09:36:08.190]

Modify Request total: 1472180 (Emergency: 0 IEPS: 0 Priority: 0)
        Pendings: 0
        Pending limit exceeded: 0
        Retransmissions: 0
        Retransmission limit exceeded: 0

        24 (Emergency: 0 IEPS: 0 Priority: 0) replied with error 514 (GCP_MEDIA_GATEWAY_CANNOT_SEND_THE_SPECIFIED_ANNOUNCEMENT)
        Originated from CRH at location 66 (visible as ERR_LOC_00066 in source code)

Cause

Solution

  1. Run the following command:

    cli_tool mrf_appl announcement-status --status

    Table 1   Missing VariableAnnouncement MO Configuration Example
    mrsv-admin@vmrf-annc-demo:~$ cli_tool mrf_appl announcement-status -s
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    ANNOUNCEMENT STORAGE
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    storage type: INTERNAL
    storage status: AVAILABLE
    last update: 2018-05-08T11:18:19+00:00
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    ANNOUNCEMENTS
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    time                          faultId   category              announcementId      language       description
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    2018-05-08T11:26:38+00:00     1         CONFIGURATION FAULT   DATE                en-GB          Missing VariableAnnouncement MO configuration.
                                                                                                     Announcement requested in H.248 is not configured.
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    
  2. Choose the appropriate action based on the problem cause indicated in the resulting printout:

    An example for a status printout can be seen in Table 1.

    Printout

    Action

    • Missing BasicAnnouncement MO configuration

    • File cache failure

    • Missing VariableAnnouncement MO configuration.

    Continue with step_N100A1_N1007D_N10030_N10001.

    Audio File copy to cache failure:
    fileCacherResult: FILE_CACHER_ERROR_SOURCE_FILE_INCORRECT_FORMAT

    Continue with step_N10154_N1009E_N10032_N10001.

    File cache failure:
    File not found: ./cache/<filename>.wav
    

    Continue with step_N10173_N1009E_N10032_N10001.


  3. Depending on the problem cause indicated in the description column, add missing MO configuration or correct the faulty logic file.

    For more information on announcement MO configuration and logic files, see vMRF Configuration Management.

  4. Create wav files in supported format.

    Verify that only fmt and data WAV chunks are present in the wav file with the following command:

    soxi -V4 <wav_file_name>
    soxi INFO formats: detected file format type `wav'
    soxi DBUG wav: WAV Chunk fmt
    soxi DBUG wav: WAV Chunk data
    soxi DBUG wav: Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
    soxi DBUG wav:         32000 byte/sec, 2 block align, 16 bits/samp, 224000 data bytes
    soxi DBUG wav:         112000 Samps/chans
    
    Input File     : 'en_1030.wav'
    Channels       : 1
    Sample Rate    : 16000
    Precision      : 16-bit
    Duration       : 00:00:07.00 = 112000 samples ~ 525 CDDA sectors
    File Size      : 224k
    Bit Rate       : 256k
    Sample Encoding: 16-bit Signed Integer PCM

    For more information on format requirements and the conversion process, see vMRF Configuration Management.

  5. Check if announcement storage is external or internal based on the ANNOUNCEMENT STORAGE section of the printout received in Step 1.
    • If the storage type is INTERNAL, run the following command to check if the announcement audio files are available in the local storage folder:

    cluster run ls -l /cluster/storage/announcements/

    • If the storage type is EXTERNAL, check if the announcement audio files are available Announcement Storage Server.

  6. If the announcement audio files are missing from the storage folder, do the following:
    Storage Type Action
    INTERNAL

    Copy the required announcement audio files to the /cluster/storage/announcements folder using the scp command.

    EXTERNAL

    Copy the required announcement audio files to the Announcement Storage Server defined during deployment.


3.6.2 Client Cannot Hear Announcement

Client cannot hear announcement while vMRF plays the announcement.

Cause

  • Early media handling settings in vSBG is incorrect.

Solution

  1. Check in vSBG if the pemSupport attribute is false or true in the SipPa MO:

    show -v -r ManagedElement=1,SbgFunction=1,PCSCF=1,AccessNetPcscf=1,SipPa=1

    Attribute Value Action
    true Continue with Step 4.
    false Continue with Step 2.

    For more information on vSBG MOs and attributes, see vSBG Managed Object Model.

  2. Set the pemSupport attribute to true, using the following commands:

    >configure

    >ManagedElement=1,SbgFunction=1,PCSCF=1,AccessNetPcscf=1,SipPa=1,pemSupport=true

  3. If the problem persists, continue with Step 4.
  4. Contact next level of Ericsson support.

Reference List

Managed Object Model, 387/155 54-LZN 765 0172-V1