[multi-dut] - sanity checks for multi-duts #2478

dipalipatel25 · 2020-11-06T15:44:35Z

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Approach

What is the motivation for this PR?

Need to support sanity_check against multi-dut

How did you do it?

checks.py:

do_checks modified to iterate over the all the duts and return
a dictionary with key as dut.hostname and value being the list of
results for sanity for the specified items (like processes, interfaces, etc.)

check_bgp:

check bgp_facts for all the asics on a DUT and return the 'down_neighbors' per asic.
If there are down_neighbors, then we return a dict with key being the bgp instance
and value being the list of down_neighbors on that bgp instance.

Examples:
  For single asic DUT with down_neighbor of 100.0.0.1:
     bgp: {
       down_neighbors : [ 100.0.0.1 ]
     }
  For multi-asic DUT with down_neighbor 100.0.0.1 on bgp0, no down_neighbors on bgp1 and down_neighbor 200.0.0.1 on bgp2
     bgp0: {
       down_neighbors : [ 100.0.0.1 ]
     },
     bgp2: {
       down_neighbors : [ 200.0.0.1 ]
    }

no change to check_services, check_processes, check_interfaces, check_dbmemory except for
- changing the log message to have the dut's hostname
check_interfaces and check_dbmemory do not support multi-asic yet, only multi-dut.
print_logs - iterate through all the duts in duthosts.

sanity_check fixture modified to use duthosts instead of duthost:

call print_logs and do_checks with duthosts
iterate throught the sanity results of each DUT in duthosts
for sanity_check / recover.

How did you verify/test it?

Tested sanity_check fixture against

single-asic dut
multi-asic dut
single-asic multi-duts
multi-asic multi-dut.
Sample output of sanity check against single-asic multi-dut with boards dut4 and dut3:

INFO     tests.common.plugins.sanity_check:__init__.py:125 !!!!!!!!!!!!!!!! Pre-test sanity check after recovery results: !!!!!!!!!!!!!!!!
{
    "dut4": [
        {
            "check_item": "services", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "database": true, 
                "bgp": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }
        }, 
        {
            "check_item": "bgp", 
            "failed": false
        }, 
        {
            "check_item": "interfaces", 
            "failed": false, 
            "down_ports": []
        }, 
        {
            "check_item": "processes", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "bgp": true, 
                "database": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }, 
            "processes_status": {
                "lldp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "lldp-syncd", 
                        "lldpd", 
                        "lldpmgrd"
                    ]
                }, 
                "database": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "redis"
                    ]
                }, 
                "bgp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "bgpcfgd", 
                        "bgpd", 
                        "fpmsyncd", 
                        "staticd", 
                        "zebra"
                    ]
                }, 
                "teamd": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "teammgrd", 
                        "teamsyncd", 
                        "tlm_teamd"
                    ]
                }, 
                "syncd": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "syncd"
                    ]
                }, 
                "swss": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "buffermgrd", 
                        "intfmgrd", 
                        "nbrmgrd", 
                        "neighsyncd", 
                        "orchagent", 
                        "portmgrd", 
                        "portsyncd", 
                        "vlanmgrd", 
                        "vrfmgrd", 
                        "vxlanmgrd"
                    ]
                }
            }
        }, 
        {
            "check_item": "dbmemory", 
            "failed": false
        }
    ], 
    "dut3": [
        {
            "check_item": "services", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "database": true, 
                "bgp": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }
        }, 
       {
            "check_item": "bgp", 
            "failed": false
        }, 
        {
            "check_item": "interfaces", 
            "failed": false, 
            "down_ports": []
        }, 
        {
            "check_item": "processes", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "bgp": true, 
                "database": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }, 
            "processes_status": {
                "lldp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "lldp-syncd", 
                        "lldpd", 
                        "lldpmgrd"
                    ]
                }, 
                "database": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "redis"
                    ]
                }, 
                "bgp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "bgpcfgd", 
                        "bgpd", 
                        "fpmsyncd", 
                        "staticd", 
                        "zebra"
                    ]
                }, 
                "teamd": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "teammgrd", 
                        "teamsyncd", 
                        "tlm_teamd"
                    ]
                }, 
                "syncd": {
                   "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "syncd"
                    ]
                }, 
                "swss": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "buffermgrd", 
                        "intfmgrd", 
                        "nbrmgrd", 
                        "neighsyncd", 
                        "orchagent", 
                        "portmgrd", 
                        "portsyncd", 
                        "vlanmgrd", 
                        "vrfmgrd", 
                        "vxlanmgrd"
                    ]
                }
            }
        }, 
        {
            "check_item": "dbmemory", 
            "failed": false
        }
    ]
}
INFO     tests.common.plugins.sanity_check:__init__.py:137 Done pre-test sanity check

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

ghost · 2020-11-06T15:44:48Z

All CLA requirements met.

tests/common/plugins/sanity_check/__init__.py

tests/common/plugins/sanity_check/checks.py

dipalipatel25 · 2020-11-09T22:19:26Z

@wangxin @yxieca - I changed the code in init.py to use pytest_assert. But, are failing vsimage tests. We are getting

PluginValidationError: unknown hook 'pytest_assert' in plugin <module 'tests.common.plugins.sanity_check' from '/data/sonic-mgmt/tests/common/plugins/sanity_check/__init__.pyc'>

We are not sure how to resolve this - would you be able to please help out.

wangxin · 2020-11-10T00:48:09Z

@wangxin @yxieca - I changed the code in init.py to use pytest_assert. But, are failing vsimage tests. We are getting
PluginValidationError: unknown hook 'pytest_assert' in plugin <module 'tests.common.plugins.sanity_check' from '/data/sonic-mgmt/tests/common/plugins/sanity_check/__init__.pyc'>
We are not sure how to resolve this - would you be able to please help out.

Because it is in a pytest plugin, pytest is trying to interpret pytest_assert as a hook function. You can try this:

from tests.common.helpers.assertions import pytest_assert as pt_assert

dipalipatel25 · 2020-11-10T14:26:38Z

@wangxin Thanks for the suggestion. That worked.

checks.py: - do_checks modified to iterate over the all the duts and return a dictionary with key as dut.hostname and value being the list of results for sanity for the specified items (like processes, interfaces, etc.) - check_bgp: - check bgp_facts for all the asics on a DUT and return the 'down_neighbors' per asic. If there are down_neighbors, then we return a dict with key being the bgp instance and value being the list of down_neighbors on that bgp instance. Eg. For single asic DUT with down_neighbor of 100.0.0.1: bgp: { down_neighbors : [ 100.0.0.1 ] } For multi-asic DUT with down_neighbor 100.0.0.1 on bgp0, no down_neighbors on bgp1 and down_neighbor 200.0.0.1 on bgp2 bgp0: { down_neighbors : [ 100.0.0.1 ] }, bgp2: { down_neighbors : [ 200.0.0.1 ] } - no change to check_services, check_processes, check_interfaces, check_dbmemory except for - changing the log message to have the dut's hostname - check_interfaces and check_dbmemory do not support multi-asic yet, only multi-dut. - print_logs - iterate through all the duts in duthosts. sanity_check fixture modified to use duthosts instead of duthost: - call print_logs and do_checks with duthosts - iterate throught the sanity results of each DUT in duthosts for sanity_check / recover.

- use pytest_assert - use variable for asic check in check_bgp

fixed logging in checks.py

arlakshm · 2020-11-11T20:28:44Z

tests/common/plugins/sanity_check/checks.py

+        bgp_facts = dut.bgp_facts(asic_index='all')
+        for asic_index, a_asic_facts in enumerate(bgp_facts):
+            a_asic_result = False
+            a_asic_neighbors = a_asic_facts['ansible_facts']['bgp_neighbors']


Can we add this to the function dut.get_bgp_neighbors(). It might be useful in other tests as well.

get_bgp_neighbors() is in SonicHost. In our case, dut is an instance of MultiAsicSonicHost. SonicHost doesn't have a reference to MultiAsicSonicHost. And we probably don't want methods like this in MultiAsicSonicHost - as then we would have to replicate all get_* methods in SonicHost.

tests/common/plugins/sanity_check/checks.py

arlakshm · 2020-11-11T20:44:10Z

tests/common/plugins/sanity_check/checks.py

+                else:
+                    a_asic_result = False
+                    if dut.facts['num_asic'] == 1:
+                        if 'bgp' in check_result:


If we have 2 devices, dut1 and dut2. dut1 has some down_neighbors, dut2 has no down_neighbors.
the down_neighbors will be removed from check_results here. Is this correct ?

No. The final results is a dictionary with key being each DUT's hostname, and value a list of all sanity checks on that DUT. So, for your scenario, check_results would be something like:

{ "node1": [ .... { "check_item": "bgp", "failed": true, "bgp": { "down_neighbors": [ "2064:101::1", "100.0.0.1", "101.0.0.1", "2064:100::1" ] } }, ..... "node2": [ ..... { "check_item": "bgp", "failed": false, }, ..... ] },

sanmalho-git · 2020-11-12T18:58:23Z

@yxieca - can you please review and approve this PR.

…r multi-duts conftest.py - collect_techsupport fixture is modified to a function called collect_techsupport_on_dut that is called by fixtures collect_techsupport and collect_techsupport_all_duts. - collect_techsupport fixture is added which uses the enum_dut_hostname to parameterize per dut.The dump is collected per dut basis. This fixture is used for collecting dump for testcases that uses this approach. - collect_techsupport_all_duts fixture is added that uses duthosts approach.All duts are looped and the dump is collected for all failed duts. This fixture is used for collecting dump for testcases that uses the duthosts. - The dump timespan is changed to 2 hours. reset_critical_services - Enhanced reset_critical_services_list fixture for multi-duts support.

yxieca · 2020-11-14T01:57:47Z

retest vsimage please

yxieca requested review from wangxin and a team November 6, 2020 16:04

yxieca reviewed Nov 6, 2020

View reviewed changes

tests/common/plugins/sanity_check/__init__.py Outdated Show resolved Hide resolved

yxieca reviewed Nov 6, 2020

View reviewed changes

tests/common/plugins/sanity_check/__init__.py Outdated Show resolved Hide resolved

yxieca reviewed Nov 6, 2020

View reviewed changes

tests/common/plugins/sanity_check/__init__.py Outdated Show resolved Hide resolved

wangxin reviewed Nov 9, 2020

View reviewed changes

tests/common/plugins/sanity_check/checks.py Outdated Show resolved Hide resolved

dipalipatel25 force-pushed the sanity_check branch from 78c205f to bb72674 Compare November 9, 2020 19:29

dipalipatel25 force-pushed the sanity_check branch from bb72674 to 0693fe1 Compare November 10, 2020 14:26

dipalipatel25 added 4 commits November 10, 2020 13:13

Changes based on review comments

b4defac

- use pytest_assert - use variable for asic check in check_bgp

Fixed error is running tests

27f7989

fixed pt_assert in the __init__.py

d7fa920

fixed logging in checks.py

dipalipatel25 force-pushed the sanity_check branch from 0693fe1 to d7fa920 Compare November 10, 2020 18:14

dipalipatel25 requested review from wangxin and yxieca November 10, 2020 20:27

arlakshm reviewed Nov 11, 2020

View reviewed changes

wangxin approved these changes Nov 12, 2020

View reviewed changes

dipalipatel25 and others added 2 commits November 13, 2020 12:58

Merge branch 'master' into sanity_check

9bc0a1a

yxieca approved these changes Nov 14, 2020

View reviewed changes

yxieca merged commit c048d95 into sonic-net:master Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multi-dut] - sanity checks for multi-duts #2478

[multi-dut] - sanity checks for multi-duts #2478

dipalipatel25 commented Nov 6, 2020

ghost commented Nov 6, 2020 •

edited by ghost

dipalipatel25 commented Nov 9, 2020

wangxin commented Nov 10, 2020

dipalipatel25 commented Nov 10, 2020

arlakshm Nov 11, 2020

dipalipatel25 Nov 11, 2020

arlakshm Nov 11, 2020

dipalipatel25 Nov 11, 2020

sanmalho-git commented Nov 12, 2020

yxieca commented Nov 14, 2020

[multi-dut] - sanity checks for multi-duts #2478

[multi-dut] - sanity checks for multi-duts #2478

Conversation

dipalipatel25 commented Nov 6, 2020

Description of PR

Type of change

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

ghost commented Nov 6, 2020 • edited by ghost

dipalipatel25 commented Nov 9, 2020

wangxin commented Nov 10, 2020

dipalipatel25 commented Nov 10, 2020

arlakshm Nov 11, 2020

Choose a reason for hiding this comment

dipalipatel25 Nov 11, 2020

Choose a reason for hiding this comment

arlakshm Nov 11, 2020

Choose a reason for hiding this comment

dipalipatel25 Nov 11, 2020

Choose a reason for hiding this comment

sanmalho-git commented Nov 12, 2020

yxieca commented Nov 14, 2020

ghost commented Nov 6, 2020 •

edited by ghost