Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[voq/inbandif] Voq inbandif port #1602

Merged
merged 10 commits into from Apr 20, 2021

Conversation

vganesan-nokia
Copy link
Contributor

What I did

Changes to support inband interface configuration

Why I did it

The VOQ chassis needs inband interface in the kernel for cpu to cpu communication across different asic instance and remote neighbor programming in the kernel. This PR adds support for configuring inband interface so that kernel interface can be created via sonic configuration.

How I verified it
swss vs test (test_virtual_chassis.py) to verify the remote neigh programming.

Details if related

Reference: VOQ HLD: https://github.com/Azure/SONiC/blob/master/doc/voq/voq_hld.md

@anshuv-mfst
Copy link

@eswaran (Arista), @abdosi - could you please review the PR, thanks.

}

//Sync the neighbor to add to the CHASSIS_APP_DB
voqSyncAddNeigh(alias, ip_address, inband_mac, neighbor_entry);

return true;
}

bool NeighOrch::delInbandNeighbor(string alias, IpAddress ip_address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will the local inband neigh be deleted? It should never happen, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. So far I do not have a compelling situation when we'll delete Inband neigh. Right now this API is not used anywhere since we do not delete the Inband interface once configured. We also do not support changing this once configured. May be in the future we may have a situation to dynamically change inband interface or ip address, when we'll use this.

@@ -380,6 +381,8 @@ void NbrMgr::doStateSystemNeighTask(Consumer &consumer)
{
SWSS_LOG_ERROR("Route entry add on dev %s failed for '%s'", nbr_odev.c_str(), kfvKey(t).c_str());
delKernelNeigh(nbr_odev, ip_address);
// Delete route to take care of deletion of exiting route of nbr for mac change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why would this take care of mac change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is mac change for a neighbor, the system neighbor in the chassis app db is updated with new mac. There is no removal of system neighbor. The neighorch (doVoqSystemNeighTask()) just adds the neighbor in SAI with updated mac. But in the kernel, the static route still exits. After updating the SAI, the state db is updated to signal kernel programming. The kernel programming in nbrmgr first adds neigh and adds static route in the kernel. Since kernel already has the route, it returns an error. So we delete the route entry here so that in the next re-try the route will be programmed.

Please also note that there will be error while trying to program the neighbor also since the kernel neighbor is tried to be programmed with same system mac (the mac of the inband port. For port type inband, in the kernel all neighbors will be programmed with the same mac - the system mac) even though there is mac change (only SAI is updated with chnaged mac). I'll fix this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

// Check for existence of host interface. If does not exist, may create
// host if for the inband here
if(type == "port" && !port.m_hif_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check? I remember setVoqInbandIntf() is called from Intfsorch::doTask, which will not process anything until all ports are ready (so hostIf must be created too). If inband port is processed just like front panel ports by portsorch, then hostIf of inban port must be created already before setVoqInbandIntf() is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. With the inband port is available in PORT table, by the time setVoqInbandIntf() is called, the host should have been created already. The check is here is kind of sanity check. Also, it gives the flexibility that you can have any port that is not part of the PORT table can be used for inband interface as long as we make the host if available, which can be done out side port config.

@@ -1282,6 +1282,15 @@ bool RouteOrch::addRoute(RouteBulkContext& ctx, const NextHopGroupKey &nextHops)

if (nexthop.ip_address.isZero())
{
if(gPortsOrch->isInbandPort(nexthop.alias))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applied for both inband port and vlan, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. since static routes of system neighbors are added for both types of inband, this is applicable for both.

@@ -224,7 +273,79 @@ def test_chassis_system_neigh(self, vct):
assert encap_index == sys_neigh_encap_index, "Encap index not sync-ed correctly"

break

# Verify programming of remote neighbor in asic db and programming of static route and static
# neigh in the kernel for the remote neighbor. The neighbor created in linecard 1 will be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easy to also cover the case of neighbor deletion too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll add a test case for delete neighbor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added test case for system neigh delete

@vganesan-nokia vganesan-nokia force-pushed the voq-inbandif-port branch 2 times, most recently from a10aa69 to 1f1fc0d Compare February 17, 2021 17:58
@vganesan-nokia
Copy link
Contributor Author

retest this please

@vganesan-nokia
Copy link
Contributor Author

retest vs please

@vganesan-nokia
Copy link
Contributor Author

@lguohan, would you please help restarting the Azure-sonic-swss(Test vstes)? It seems there is some cleanup issue. "retest this" is not responding. I do not have sufficient permission to start "AzurePipeline run"

ysmanman
ysmanman previously approved these changes Feb 24, 2021
@vganesan-nokia
Copy link
Contributor Author

Azurepipelines run

@vganesan-nokia
Copy link
Contributor Author

/Azurepipelines run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1602 in repo Azure/sonic-swss

@abdosi
Copy link
Contributor

abdosi commented Mar 7, 2021

/Azurepipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vganesan-nokia
Copy link
Contributor Author

@abdosi, thanks for starting the Azurepipelines. Would you please review and approve so that we can proceed to merge?

@abdosi abdosi self-requested a review March 15, 2021 02:35
@abdosi
Copy link
Contributor

abdosi commented Mar 15, 2021

reviewing this.

memcpy(attr.value.mac, inband_mac.getMac(), 6);
neighbor_attrs.push_back(attr);

if(!addVoqEncapIndex(alias, ip_address, neighbor_attrs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually for local neighbors we let the SAI allocate encap index and the neighbor along with the allocated encap index is synced to all the remote asics. Here we call the encap index addition for local neighbor to give the flexibility for user given encap index (user has to make sure that the encap index unique within asic). Simply configure the voq neighbor table with encap index attribute (similar to remote neighbor encap index) and that encap index will be sent to the SAI. So far we have not used this approach. Currently since we do not have encap index for local neighbors in voq neigh table, we always let the SAI allocate the encap index.
This api is similar to the addNeighbor() api that handles adding encap index for both local and remote neighbors except that this does not add next hop and the host route addition is masked (specifically required for inband port type neighbors)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand
if gIntfsOrch->isRemoteSystemPortIntf(alias) is True we return from this API.
addVoqEncapIndex works only gIntfsOrch->isRemoteSystemPortIntf(alias) is True.
So why we are even calling it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The addVoqEncapIndex() is supposed to get encap index from voq neighbor table only for remote system port neighbors. The addVoqEncapIndex() is called for all neihbors (local and remote neighbors) from addNeighbor() function. We need this remote system port interface check so that only for remote neighbors, encap index attribute will be sent to SAI.
For addInbandNeighbor() since we are sure that this is a local neighbor, we can avoid calling addVoqEncapIndex() function. I'll fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been fixed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.


//For "vlan" type inband, the inband reachability info syncinng can be ARP learning of other
//asics inband or static configuration or through CHASSIS_APP_DB sync (this function)
status = sai_neighbor_api->create_neighbor_entry(&neighbor_entry, static_cast<uint32_t>(neighbor_attrs.size()), neighbor_attrs.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i forgot. Can you please tells us the reason why we are creating Neighbor record on SAI for the local Inband/Recycle Port interface ? How this will get used ? Which section of design document cover the need for doing this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question what is the use of creating this information from SAI perspective. How this will be used later ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a brief info about inband interface and recycle port: Inband interface is for cpu-to-cpu communication. In this PR we use recycle port facility for inband interface. Think of recycle port as any other network port but the egress side is looped back to the ingress. Since host is reached via inband interface for all the protocol communications we need to add the self neighbor on the inband interface. This is like any other neighbor on the front panel port. Since the neighbor resides within the host of the asic itself, we use the recycle facility to loop back the packet and add neighbor to consume the packet.
After adding the neighbor we inform the neighbor to other asics in the same we do for any neighbor on any other front panel port. Other asics program this neighbor on the system port of inband interface of this asic, When remote asics sends packet to this asic, they inject packet on their cpu port and everything else is similar to regular routing handled in the hardware.
For more information on this please refer to the section 2.5.1 of the voq HLD https://github.com/Azure/SONiC/blob/master/doc/voq/voq_hld.md.

Copy link
Contributor

@abdosi abdosi Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vganesan-nokia thanks i had gone through this document before posting the question. I am not able to follow why there is need to create self neighbor for recycle port. Let me know what i am missing in below :
LC1 CPU want to send packet destined to LC2 Inband IP and need to be consumed by LC2 CPU.
For this my understanding is

  • Packet from LC1 CPU will reach LC1 Recycle Port via NetDev
  • Packet will get looped back do Host Lookup in LC1 Ingress ASIC and will send packet to LC2 Recylce Port
  • Packet will reach LC2 Recycle Port and looped back into Ingress ASIC of LC2
  • LC2 will have /32 route for its Inband IP and will go to LC2 CPU via Netdev of LC2 Recycle port.

So I am not sure where the self neighbor entry is coming into picture and how SAI will use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vganesan-nokia thanks i had gone through this document before posting the question. I am not able to follow why there is need to create self neighbor for recycle port. Let me know what i am missing in below :
LC1 CPU want to send packet destined to LC2 Inband IP and need to be consumed by LC2 CPU.
For this my understanding is

  • Packet from LC1 CPU will reach LC1 Recycle Port via NetDev
    [vganesan-nokia] >>> Here, the packet injection is on asic's cpu port with packet's destination information pointing to egress of recycle port.
  • Packet will get looped back do Host Lookup in LC1 Ingress ASIC and will send packet to LC2 Recylce Port
    [vganesan-nokia] >>> For here, when the packet is looped back, i.e., when packet is received on ingress of the same LC1 asic it has to be directed to router block. For this consumption to happen we need the neighbor info of self on the recycle port. Once packet is in routing operation block, the routing operation determines the destination information i.e., the system port of inband interface of LC2
  • Packet will reach LC2 Recycle Port and looped back into Ingress ASIC of LC2
    [vganesan-nokia] >>> Here from fabric, the packet reaches the egress of the recycle port in LC2. This is looped back to ingress where the host IP trap rule sends packet to CPU.
  • LC2 will have /32 route for its Inband IP and will go to LC2 CPU via Netdev of LC2 Recycle port.

So I am not sure where the self neighbor entry is coming into picture and how SAI will use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inband port IP address of the "my-asic" is the next-hop in the routing tables of the "other-asics" - for routes that point to "my-asic" host IP stack. So "my-asic" needs to create this neighbor and let the other asics know about this neighbor and its encap index. This will allow the other asics to host traffic destined to "my-asic"me via the inband port of "my-asic".

This is the only reason to create this neighbor. Please note that for this neighbor we do not create a host route in SAI. The neighbor attribute SAI_NEIGHBOR_ENTRY_ATTR_NO_HOST_ROUTE is used for this purpose. We do that becuase -

  1. The IP2ME trap rules can already send the packet to the CPU once it comes around to the ingress of the recyle port.
  2. If we have the host-route for this neighbor - it loop around the recyle port instead of being trapped to the CPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i forgot. Can you please tells us the reason why we are creating Neighbor record on SAI for the local Inband/Recycle Port interface ? How this will get used ? Which section of design document cover the need for doing this ?

[skeesara-nokia] This is needed so that we get an encap index that the other asics can use it for the case where the next-hop is the inband port IP of my-asic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed please update HLD with the information of L2Header (Src/Dest Mac) being used for the above packat flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. A PR has been raised with the requested update.

sonic-net/SONiC#777

vedganes added 5 commits April 9, 2021 20:22
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Changes are done to: (1) Setup inband interface - applicable for both
port type and vlan type (2) Add neighbors for inband interface and sync
to chassis app db (3) Postpone neighbor programming both in kernel
and in asic for remote neighbors till inband interface is operationally
up. (4) Update "oper" state in STATE DB for Inband interface
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Changes done to handle procssing specific to port type inband interface
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

- Added tests for inband if (port type) and remote neigh programming in asic db
and kernel
- Fixed switch_id values in the system port config.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Fixed code review comments:
- Added comment to delInbandNeighbor() to indicate this is for local
inband interface
- Used hardcoded "Inband" prefix
- Reverted to net dev link up event detection using IFF_LOWER_UP flag
- Renamed "oper_status" of net dev to "netdev_oper_status" in the state
db PORT_TABLE.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Based on review comments Recyle port HLD
(sonic-net/SONiC#742) the inband port name prefix is
changed from Inband to "Ethernet-IB" similar to what we have for
Ethernet-BP (Ethernet-Backplane) in multi-asic design. Because of this
changes, we no longer need "INBAND_PREFIX" definition since Ethernet-IB
is covered in "INTFS_PREIFX"
vedganes added 5 commits April 9, 2021 20:22
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Added vs test case for system neigh delete
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Changes done to delete kernel neigh when there is failure in adding
kernel neigh for voq system neighbors if kernel entry already exists.
When there is mac change, system neighs are synced with changed mac.
With voq inband port type, static neighbor for this system neighbors
are added in kernel always with single chassis mac address. So to take
care of skpping adding neighbor we delete the kernel neighbor so that
next re-try will program the neighbor and hence the route addition will
proceed.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
VOQ System neighbor delete test is piggy backed to existing system neigh
test. This test also is the cleanup for the existing system neigh test

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
- Changes to avoid calling function to add encap index attribute for
inband interface host neighbor. Since encap index attribute is supplied
only for the remote neighbors and since the inband interface neighbor is
local, encap index will not be added even if this function is called. So
removed calling the api addVoqEncapIndex() while constructing neighbor
add message for inband interface host.
- Changes in vs test for system neigh delete to avoid timing issue

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
@abdosi abdosi merged commit 500e2e9 into sonic-net:master Apr 20, 2021
abdosi pushed a commit to sonic-net/SONiC that referenced this pull request Apr 20, 2021
…ort (#777)

This update is being made in response to code review comments on pull request - sonic-net/sonic-swss#1602
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
What I did

Changes to support inband interface configuration

Why I did it

The VOQ chassis needs inband interface in the kernel for cpu to cpu communication across different asic instance and remote neighbor programming in the kernel. This PR adds support for configuring inband interface so that kernel interface can be created via sonic configuration.

How I verified it
swss vs test (test_virtual_chassis.py) to verify the remote neigh programming.
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Replace set_entry with mod_entry when setting the proxy_arp value for a VLAN.

Using set_entry will delete any other fields set for the key used, which is not desirable.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants