Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multi-dut] - sanity checks for multi-duts #2478

Merged
merged 6 commits into from Nov 15, 2020

Conversation

dipalipatel25
Copy link
Contributor

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

Need to support sanity_check against multi-dut

How did you do it?

checks.py:

  • do_checks modified to iterate over the all the duts and return
    a dictionary with key as dut.hostname and value being the list of
    results for sanity for the specified items (like processes, interfaces, etc.)
  • check_bgp:
    • check bgp_facts for all the asics on a DUT and return the 'down_neighbors' per asic.
      If there are down_neighbors, then we return a dict with key being the bgp instance
      and value being the list of down_neighbors on that bgp instance.
      Examples:
        For single asic DUT with down_neighbor of 100.0.0.1:
           bgp: {
             down_neighbors : [ 100.0.0.1 ]
           }
        For multi-asic DUT with down_neighbor 100.0.0.1 on bgp0, no down_neighbors on bgp1 and down_neighbor 200.0.0.1 on bgp2
           bgp0: {
             down_neighbors : [ 100.0.0.1 ]
           },
           bgp2: {
             down_neighbors : [ 200.0.0.1 ]
          }
      
  • no change to check_services, check_processes, check_interfaces, check_dbmemory except for
    • changing the log message to have the dut's hostname
  • check_interfaces and check_dbmemory do not support multi-asic yet, only multi-dut.
  • print_logs - iterate through all the duts in duthosts.

sanity_check fixture modified to use duthosts instead of duthost:

  • call print_logs and do_checks with duthosts
  • iterate throught the sanity results of each DUT in duthosts
    for sanity_check / recover.

How did you verify/test it?

Tested sanity_check fixture against

  • single-asic dut
  • multi-asic dut
  • single-asic multi-duts
  • multi-asic multi-dut.
    Sample output of sanity check against single-asic multi-dut with boards dut4 and dut3:
INFO     tests.common.plugins.sanity_check:__init__.py:125 !!!!!!!!!!!!!!!! Pre-test sanity check after recovery results: !!!!!!!!!!!!!!!!
{
    "dut4": [
        {
            "check_item": "services", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "database": true, 
                "bgp": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }
        }, 
        {
            "check_item": "bgp", 
            "failed": false
        }, 
        {
            "check_item": "interfaces", 
            "failed": false, 
            "down_ports": []
        }, 
        {
            "check_item": "processes", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "bgp": true, 
                "database": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }, 
            "processes_status": {
                "lldp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "lldp-syncd", 
                        "lldpd", 
                        "lldpmgrd"
                    ]
                }, 
                "database": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "redis"
                    ]
                }, 
                "bgp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "bgpcfgd", 
                        "bgpd", 
                        "fpmsyncd", 
                        "staticd", 
                        "zebra"
                    ]
                }, 
                "teamd": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "teammgrd", 
                        "teamsyncd", 
                        "tlm_teamd"
                    ]
                }, 
                "syncd": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "syncd"
                    ]
                }, 
                "swss": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "buffermgrd", 
                        "intfmgrd", 
                        "nbrmgrd", 
                        "neighsyncd", 
                        "orchagent", 
                        "portmgrd", 
                        "portsyncd", 
                        "vlanmgrd", 
                        "vrfmgrd", 
                        "vxlanmgrd"
                    ]
                }
            }
        }, 
        {
            "check_item": "dbmemory", 
            "failed": false
        }
    ], 
    "dut3": [
        {
            "check_item": "services", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "database": true, 
                "bgp": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }
        }, 
       {
            "check_item": "bgp", 
            "failed": false
        }, 
        {
            "check_item": "interfaces", 
            "failed": false, 
            "down_ports": []
        }, 
        {
            "check_item": "processes", 
            "failed": false, 
            "services_status": {
                "lldp": true, 
                "bgp": true, 
                "database": true, 
                "teamd": true, 
                "syncd": true, 
                "swss": true
            }, 
            "processes_status": {
                "lldp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "lldp-syncd", 
                        "lldpd", 
                        "lldpmgrd"
                    ]
                }, 
                "database": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "redis"
                    ]
                }, 
                "bgp": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "bgpcfgd", 
                        "bgpd", 
                        "fpmsyncd", 
                        "staticd", 
                        "zebra"
                    ]
                }, 
                "teamd": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "teammgrd", 
                        "teamsyncd", 
                        "tlm_teamd"
                    ]
                }, 
                "syncd": {
                   "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "syncd"
                    ]
                }, 
                "swss": {
                    "status": true, 
                    "exited_critical_process": [], 
                    "running_critical_process": [
                        "buffermgrd", 
                        "intfmgrd", 
                        "nbrmgrd", 
                        "neighsyncd", 
                        "orchagent", 
                        "portmgrd", 
                        "portsyncd", 
                        "vlanmgrd", 
                        "vrfmgrd", 
                        "vxlanmgrd"
                    ]
                }
            }
        }, 
        {
            "check_item": "dbmemory", 
            "failed": false
        }
    ]
}
INFO     tests.common.plugins.sanity_check:__init__.py:137 Done pre-test sanity check

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@ghost
Copy link

ghost commented Nov 6, 2020

CLA assistant check
All CLA requirements met.

@yxieca yxieca requested review from wangxin and a team November 6, 2020 16:04
@dipalipatel25
Copy link
Contributor Author

@wangxin @yxieca - I changed the code in init.py to use pytest_assert. But, are failing vsimage tests. We are getting

PluginValidationError: unknown hook 'pytest_assert' in plugin <module 'tests.common.plugins.sanity_check' from '/data/sonic-mgmt/tests/common/plugins/sanity_check/__init__.pyc'>

We are not sure how to resolve this - would you be able to please help out.

@wangxin
Copy link
Collaborator

wangxin commented Nov 10, 2020

@wangxin @yxieca - I changed the code in init.py to use pytest_assert. But, are failing vsimage tests. We are getting

PluginValidationError: unknown hook 'pytest_assert' in plugin <module 'tests.common.plugins.sanity_check' from '/data/sonic-mgmt/tests/common/plugins/sanity_check/__init__.pyc'>

We are not sure how to resolve this - would you be able to please help out.

Because it is in a pytest plugin, pytest is trying to interpret pytest_assert as a hook function. You can try this:

from tests.common.helpers.assertions import pytest_assert as pt_assert

@dipalipatel25
Copy link
Contributor Author

@wangxin Thanks for the suggestion. That worked.

checks.py:
  - do_checks modified to iterate over the all the duts and return
    a dictionary with key as dut.hostname and value being the list of
    results for sanity for the specified items (like processes, interfaces, etc.)
  - check_bgp:
      - check bgp_facts for all the asics on a DUT and return the 'down_neighbors' per asic.
        If there are down_neighbors, then we return a dict with key being the bgp instance
        and value being the list of down_neighbors on that bgp instance.
        Eg.
          For single asic DUT with down_neighbor of 100.0.0.1:
             bgp: {
               down_neighbors : [ 100.0.0.1 ]
             }
          For multi-asic DUT with down_neighbor 100.0.0.1 on bgp0, no down_neighbors on bgp1 and down_neighbor 200.0.0.1 on bgp2
             bgp0: {
               down_neighbors : [ 100.0.0.1 ]
             },
             bgp2: {
               down_neighbors : [ 200.0.0.1 ]
            }
  - no change to check_services, check_processes, check_interfaces, check_dbmemory except for
     - changing the log  message to have the dut's hostname
  - check_interfaces and check_dbmemory do not support multi-asic yet, only multi-dut.
  - print_logs - iterate through all the duts in duthosts.

sanity_check fixture modified to use duthosts instead of duthost:
  - call print_logs and do_checks with duthosts
  - iterate throught the sanity results of each DUT in duthosts
    for sanity_check / recover.
- use pytest_assert
- use variable for asic check in check_bgp
fixed logging in checks.py
Comment on lines +104 to +107
bgp_facts = dut.bgp_facts(asic_index='all')
for asic_index, a_asic_facts in enumerate(bgp_facts):
a_asic_result = False
a_asic_neighbors = a_asic_facts['ansible_facts']['bgp_neighbors']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this to the function dut.get_bgp_neighbors(). It might be useful in other tests as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_bgp_neighbors() is in SonicHost. In our case, dut is an instance of MultiAsicSonicHost. SonicHost doesn't have a reference to MultiAsicSonicHost. And we probably don't want methods like this in MultiAsicSonicHost - as then we would have to replicate all get_* methods in SonicHost.

tests/common/plugins/sanity_check/checks.py Show resolved Hide resolved
else:
a_asic_result = False
if dut.facts['num_asic'] == 1:
if 'bgp' in check_result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have 2 devices, dut1 and dut2. dut1 has some down_neighbors, dut2 has no down_neighbors.
the down_neighbors will be removed from check_results here. Is this correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The final results is a dictionary with key being each DUT's hostname, and value a list of all sanity checks on that DUT. So, for your scenario, check_results would be something like:

{
    "node1": [
        ....
        {
            "check_item": "bgp", 
            "failed": true, 
            "bgp": {
                "down_neighbors": [
                    "2064:101::1", 
                    "100.0.0.1", 
                    "101.0.0.1", 
                    "2064:100::1"
                ]
            }
        }, 
        .....
   "node2": [
        .....
        {
            "check_item": "bgp", 
            "failed": false, 
        }, 
        .....
     ]
  }, 
  

@sanmalho-git
Copy link
Contributor

@yxieca - can you please review and approve this PR.

dipalipatel25 and others added 2 commits November 13, 2020 12:58
…r multi-duts

conftest.py
  - collect_techsupport fixture is modified to a function called
    collect_techsupport_on_dut that is called by fixtures collect_techsupport
    and collect_techsupport_all_duts.
  - collect_techsupport fixture is added which uses the enum_dut_hostname
    to parameterize per dut.The dump is collected per dut basis.
    This fixture is used for collecting dump for testcases that uses this approach.
  - collect_techsupport_all_duts fixture is added that uses duthosts approach.All
    duts are looped and the dump is collected for all failed duts.
    This fixture is used for collecting dump for testcases that uses the duthosts.
  - The dump timespan is changed to 2 hours.
reset_critical_services
  - Enhanced reset_critical_services_list fixture for multi-duts support.
@yxieca
Copy link
Collaborator

yxieca commented Nov 14, 2020

retest vsimage please

@yxieca yxieca merged commit c048d95 into sonic-net:master Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants