Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARP: Modified '/arp/test_neighbor_mac_noptf.py' testcase to skip 'bgp shutdown all' for modular chassis #3788

Merged
merged 1 commit into from Aug 11, 2021

Conversation

oxygen980
Copy link
Contributor

@oxygen980 oxygen980 commented Jul 13, 2021

Description of PR

This PR is to skip 'bgp shutdown all' for modular chassis in arp/test_neighbor_mac_noptf.py
Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911

Approach

What is the motivation for this PR?

Teat case 'arp/test_neighbor_mac_noptf.py' brings all bgp down, In case of Modular chassis testcase fails because VOQ chassis has extra routes so when script brings bgp down and verify for the routes and expects routes should be zero never works.

How did you do it?

To Take care of this failure, we do not shutdown all BGP neighbors for a modular chassis.

The reasons are:-

  1. When we bring BGP down using 'sudo config bgp shutdown all' on VOQ chassis linecard, it only brings down the eBGP neighbors, and not the BGP_VOQ_CHASSIS_NEIGHBORs (the iBGP neighbors to other asics in the chassis)
  2. The asic has routes that are learnt from other remote asics in the chassis.
  3. VoQ architecture adds static routes for inband interfaces and all eBGP peers on the remote asics as well.

To get all the routes to be flushed could be very complex. Also, the reason that the BGP shutdown was added was because on some DUT's are overwhelmed with the BGP updates and thus causing this test to intermittently fail. However, we don't see such intermittent failure on linecards of a VoQ chassis which has 12K routes in a T2 topology - possibly because CPU is powerful.

How did you verify/test it?

Verified on VOQ chassis

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@oxygen980 oxygen980 requested a review from a team as a code owner July 13, 2021 21:17
@oxygen980 oxygen980 changed the title ARP: In '/arp/test_neighbor_mac_noptf.py' testcase skip 'bgp shutdown all' for modular chassis ARP: Modified '/arp/test_neighbor_mac_noptf.py' testcase to skip 'bgp shutdown all' for modular chassis Jul 13, 2021
pytest.fail('BGP Shutdown Timeout: BGP route removal exceeded 120 seconds.')
if not duthost.get_facts().get("modular_chassis"):
duthost.command("sudo config bgp shutdown all")
if not wait_until(120, 2.0, self._check_no_bgp_routes, duthost):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can _check_no_bgp_routes be updated for VOQ chassis to check on eBGP routes and skip other routes ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks !!

@anshuv-mfst
Copy link

@Azure/sonic-chassis

@arlakshm arlakshm merged commit aa08416 into sonic-net:master Aug 11, 2021
vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
)

What is the motivation for this PR?
Teat case 'arp/test_neighbor_mac_noptf.py' brings all bgp down, In case of Modular chassis testcase fails because VOQ chassis has extra routes so when script brings bgp down and verify for the routes and expects routes should be zero never works.

How did you do it?
To Take care of this failure, we do not shutdown all BGP neighbors for a modular chassis.

The reasons are:-

When we bring BGP down using 'sudo config bgp shutdown all' on VOQ chassis linecard, it only brings down the eBGP neighbors, and not the BGP_VOQ_CHASSIS_NEIGHBORs (the iBGP neighbors to other asics in the chassis)
The asic has routes that are learnt from other remote asics in the chassis.
VoQ architecture adds static routes for inband interfaces and all eBGP peers on the remote asics as well.
To get all the routes to be flushed could be very complex. Also, the reason that the BGP shutdown was added was because on some DUT's are overwhelmed with the BGP updates and thus causing this test to intermittently fail. However, we don't see such intermittent failure on linecards of a VoQ chassis which has 12K routes in a T2 topology - possibly because CPU is powerful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants