Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linksync] Netdev oper status determination using IFF_RUNNING #1568

Merged
merged 1 commit into from Mar 5, 2021

Conversation

vganesan-nokia
Copy link
Contributor

@vganesan-nokia vganesan-nokia commented Dec 24, 2020

Signed-off-by: vedganes vedavinayagam.ganesan@nokia.com

Changes in linksync to use IFF_RUNNING to determine netdev operational
status instead of using IFF_LOWER_UP.

What I did

Changed linksync.cpp to use IFF_RUNNING to determine the netdev operational status instead of using IFF_LOWER_UP.

Why I did it

The objective of this change is to make orchagent/linksync align with FRR/Zebra in handling interface states. Zebra uses IFF_RUNNING flag to determine interfaces' oper state while linksync uses IFF_LOWER_UP. Zebra rightly depends on IFF_RUNNING flag. As a routing daemon, zebra wants to know whether the interface is capable of passing packets or not. The flag IFF_RUNNING indicates exactly that (based on comment in if.h, this flag reflects the interface oper up state as defined in RFC2863). Since orchagent uses IFF_LOWER_UP, which comes earlier than IFF_RUNNING, there is interface state mismatch between zebra and orchagent for a window of time. Since with voq implementation we write static neigh/routes when the netdev (inband port) becomes oper up and bgp depends on these kernel entries, there is a need for this change to have the interface state timing same in both orchagent and zebra.

Refer issue sonic-net/sonic-buildimage#6295 for detailed information.

How I verified it

  • Currently the operational state of netdev of front panel ports are not used. It is used only for the management port. All vs tests passed.

Details if related

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Changes in linksync to use IFF_RUNNING to determine link operational
status instead of using IFF_LOWER_UP.
@vganesan-nokia
Copy link
Contributor Author

retest this please

@anshuv-mfst
Copy link

@eswaran (Arista), @abdosi - could you please review the PR, Thanks.

@lguohan
Copy link
Contributor

lguohan commented Jan 27, 2021

/Azurepipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -174,7 +174,7 @@ void LinkSync::onMsg(int nlmsg_type, struct nl_object *obj)

unsigned int flags = rtnl_link_get_flags(link);
bool admin = flags & IFF_UP;
bool oper = flags & IFF_LOWER_UP;
bool oper = flags & IFF_RUNNING;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For e.g, this interface has LOWER_UP but state is UNKNOWN. Will it have IFF_RUNNING set?

58: Loopback0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 96:a7:77:d1:73:34 brd ff:ff:ff:ff:ff:ff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Linux command "ip link show" does not show it here. The bgp "show interface" will show this flag. Here is an example of interface info got by linux command and bgp command.

admin@Linecard2:~$
admin@Linecard2:~$ ip link show | grep -A1 Loopback0
10: Loopback0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether f6:55:7a:eb:85:eb brd ff:ff:ff:ff:ff:ff
admin@Linecard2:~$
admin@Linecard2:~$ docker exec bgp vtysh -c "show interface" | grep -A12 Loopback0
Interface Loopback0 is up, line protocol is up
Link ups: 1 last: 2019/03/02 21:32:48.16
Link downs: 0 last: (never)
vrf: default
index 10 metric 0 mtu 65536 speed 0
flags: <UP,BROADCAST,RUNNING,NOARP>
Type: Unknown
HWaddr: f6:55:7a:eb:85:eb
inet 10.1.0.1/32 unnumbered
inet6 fe80::f455:7aff:feeb:85eb/64
inet6 fc00:10::1/128
Interface Type Other
Interface Slave Type None
admin@Linecard2:~$

@vganesan-nokia
Copy link
Contributor Author

/AzurePipelines run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1568 in repo Azure/sonic-swss

@vganesan-nokia
Copy link
Contributor Author

@prsunny, would please re-start the AzurePipelines? It seems they are stuck and not progressing. I tried to start it. It seems I do not have permission to start AzurePipelines.
Thanks

@prsunny
Copy link
Collaborator

prsunny commented Feb 23, 2021

/Azurepipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vganesan-nokia
Copy link
Contributor Author

@prsunny, thanks for restarting tests.

@vganesan-nokia
Copy link
Contributor Author

@eswaran, @abdosi - A gentle remainder, would you please review and approve?

@lguohan
Copy link
Contributor

lguohan commented Mar 5, 2021

HERE IS LINUX MANUAL, wonder what the different is between iff_running and if_lower_up? why do we think iff_running is the right thing to do?

          IFF_RUNNING       Resources allocated.
          IFF_LOWER_UP      Driver signals L1 up (since Linux 2.6.17)

@vganesan-nokia
Copy link
Contributor Author

HERE IS LINUX MANUAL, wonder what the different is between iff_running and if_lower_up? why do we think iff_running is the right thing to do?

          IFF_RUNNING       Resources allocated.
          IFF_LOWER_UP      Driver signals L1 up (since Linux 2.6.17)

The objective is to make orchagent/linksync align with FRR/Zebra in handling interface states. Zebra uses IFF_RUNNING flag to determine interfaces' oper state while linksync uses IFF_LOWER_UP. This leads to interface state mismatch between zebra and orchagent for a window of time (IFF_LOWER_UP and IFF_RUNNING events come at different points of time). Since with voq implementation we write static neigh/routes when the netdev (inband port) becomes oper up and bgp depends on these kernel entries, there is a need for having the interface state timing same in both orchagent and zebra. I have described this in detail in issue sonic-net/sonic-buildimage#6295.

@lguohan
Copy link
Contributor

lguohan commented Mar 5, 2021

is iff_running after iff_lower_up? is zebra doing the right thing?

@vganesan-nokia
Copy link
Contributor Author

is iff_running after iff_lower_up?

Yes. IFF_LOWER_UP comes up earlier than IFF_RUNNING. But zebra is not acting on IFF_LOWE_UP.

is zebra doing the right thing?

In our SONiC linux, in the file if.h, the IFF_RUNNING has comment "@IFF_RUNNING: interface RFC2863 OPER_UP. Volatile."

As a routing daemon, zebra want to know whether the interface is capable of passing packets or not. So it rightly depends on this IFF_RUNNING flag which indicates whether the interface is able to pass packets or not.

So I guess zebra is doing the right thing.

@lguohan
Copy link
Contributor

lguohan commented Mar 5, 2021

make sense, can you add above discussion in the pr description. we will add it as part of the pr commit message.

@vganesan-nokia
Copy link
Contributor Author

make sense, can you add above discussion in the pr description. we will add it as part of the pr commit message.

Updated the PR comment as suggested.

@lguohan lguohan merged commit 0fa4d74 into sonic-net:master Mar 5, 2021
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
…net#1568)

The objective of this change is to make orchagent/linksync align with FRR/Zebra in handling interface states. Zebra uses IFF_RUNNING flag to determine interfaces' oper state while linksync uses IFF_LOWER_UP. Zebra rightly depends on IFF_RUNNING flag. As a routing daemon, zebra wants to know whether the interface is capable of passing packets or not. The flag IFF_RUNNING indicates exactly that (based on comment in if.h, this flag reflects the interface oper up state as defined in RFC2863). Since orchagent uses IFF_LOWER_UP, which comes earlier than IFF_RUNNING, there is interface state mismatch between zebra and orchagent for a window of time. Since with voq implementation we write static neigh/routes when the netdev (inband port) becomes oper up and bgp depends on these kernel entries, there is a need for this change to have the interface state timing same in both orchagent and zebra.

Refer issue sonic-net/sonic-buildimage#6295 for detailed information.

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
#### What I did

1. Add CLI support for port auto negotiation feature.
2. Add db_migrator change for auto negotiation feature
2. Add unit test cases for all changes

#### How I did it

1. Add new subcommands to "config interface" command group to allow user configuring port auto negotiation
2. Add new subcommands to "show interfaces" command group to allow user show auto negotiation status
3. In db_migrator.py, change auto negotiation related DB field to latest one
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
…rator test cases as well (sonic-net#1614)

- What I did
Originally, the method advance_version_for_expected_database was introduced (in sonic-net#1566) to handle the case the latest version in CONFIG_DB is greater than the latest version in mellanox_buffer_migrator.
Now there are other database migrators whose test cases can also encounter this situation, like port auto-negotiation (sonic-net#1568) and port-channel for LACP key (sonic-net#1473).
So I would like to make the method public, available for all database migrators.
Related database migrator test cases have been updated accordingly.

- How to verify it
Run the unit test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants