Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Egress Sflow Enhancement. #1268

Merged
merged 6 commits into from Aug 31, 2023
Merged

Egress Sflow Enhancement. #1268

merged 6 commits into from Aug 31, 2023

Conversation

rajkumar38
Copy link
Contributor

@rajkumar38 rajkumar38 commented Feb 20, 2023

Updated existing sFlow HLD for egress Sflow support.

Repo PR Title State
sonic-utilities [sflow] Added egress Sflow support Open
sonic-swss Egress Sflow Support Merged
sonic-buildimage [sflow] Add egress sflow support Merged

jeff-yin
jeff-yin previously approved these changes Feb 23, 2023
@gongjianLhr
Copy link

Hi rajkumar38,
I have a question about the encapsulated sample when egress psample is enabled.
If a flow get in the DUT on Ethernet1 and routed to Ethernet10. And Ethernet1 of the DUT is enabled for ingress psample, Ethernet10 is enabled for egress psample. And now the sflow collector received and parsed some encapsulated samples from this DUT with IIFINDEX (Ethernet1) and OIFINDEX(Ethernet10). How could it know if this encapsulated sample is sampled on Ethernet1 ingress sample or on Ethernet10 egress sample?

regards
Gongjian from alibaba

SAMPLE_RATE = 1*7DIGIT ; average number of packets skipped before the sample is taken
SAMPLE_RATE = 1*7DIGIT ; average number of packets
skipped before the sample is taken
direction can have value of rx | tx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both direction as well here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both direction as well here?

This schema is changed with new proposal.

@@ -68,9 +68,8 @@ sFlow will be implemented in multiple phases:
2. sFlow extended router support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajkumar38 Do you know whether all 3 above phases are completed? if yes, mentioning the current status would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajkumar38 Do you know whether all 3 above phases are completed? if yes, mentioning the current status would be helpful.

To our understanding, there is no progress on phase II and III.
It is not clear if anyone else is already started working on this.

@dgsudharsan
Copy link
Contributor

@vadymhlushko-mlnx FYI

@zhangyanzhao
Copy link
Collaborator

@sflow
Copy link

sflow commented Feb 28, 2023

Hello all,

The PSAMPLE netlink channel defines the integer attribute PSAMPLE_ATTR_SAMPLE_GROUP. The sFlow daemon on SONiC, hsflowd, expects ingress samples to appear with group=1, and egress samples to appear with group=2.

Starting in version 2.0.45-1, hsflowd for SONiC will automatically be ready to accept egress samples if they appear on the netlink channel. So no need to change anything else in the sflow container if you upgrade hsflowd to that release.

Regards,
Neil McKee, InMon Corp.

@pphaal
Copy link

pphaal commented Feb 28, 2023

Hi rajkumar38,

While sFlow supports bidirectional packet sampling, it requires that the sampling rate be the same in each direction, i.e. ingress_sampling_rate == egress_sampling_rate for bidirectionall sampled interfaces.

I would suggest an option 3 that would simplify the proposal and minimize issues with backward compatibility:

"SFLOW_SESSION": {
    "Ethernet0": {
       "admin_state": "down",
       "sample_rate": "40000",
       "sample_direction":"both"           
    },

The sflow sample-rate setting should remain unchanged. Instead, a new CLI settings should be added to configure direction, i.e.:
sflow interface direction {rx|tx|both}
Decoupling sampling rate and sampling direction in the CLI is simpler and leaving the existing sampling-rate command unchanged simplifies forward migration. If unspecified, the default direction should be "rx" for backward compatibility (and because this is the most common setting).

The "show sflow" command should also minimize changes, i.e. first three columns should remain the same, a fourth column could be added to indicate direction:

Interface     Admin Status   Sampling rate  Sampling direction
---------     ------------   -------------      --------------
Ethernet0     down           40000                   both

Similarly the DB schema can be extended with a SAMPLE_DIRECTION attribute added to SFLOW_SESSION_TABLE. When migrating from an older SONiC version, the default sampling direction should be set to "rx"

Regards,
Peter Phaal, InMon Corp.

1. Egress sampling support.
2. sFlow backoff mechanism (Drop the packets beyond configured CPU Queue rate limit).
3. sFlow over vlan interfaces.
1. sFlow backoff mechanism (Drop the packets beyond configured CPU Queue rate limit).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rajkumar38, Enabling both direction sample may double the number of packets trapped to CPU, meanwhile any drop of sample packet cause inaccuracy. Agree that backoff mechanism is too heavy, have you considered adding a monitor in sFlow container to alert sample packet loss due to Copp? That will help users to use it more easily.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automated sampling rate backoff mechanism described in the sFlow spec tends to cause more problems than it solves. Instead, the sFlow agent should maintain the configured sampling rates and report internal drops using the drops counter in sFlow packet sample records (defined as, number of times that the sFlow agent detected that a packet marked to be sampled was dropped due to lack of resources). If excessive drops are detected then the network operator may decide to configure a higher sampling rate to improve accuracy during peak loads.

The psample PSAMPLE_ATTR_GROUP_SEQ allows the ASIC driver to indicate to the sFlow agent that samples have been dropped/rate limited. In addition to incrementing the counter each time a psample message is sent, it should also increment the counter by the number of packet samples lost to the policer. The hsflowd agent uses discontinuitites in the PSAMPLE_ATTR_GROUP_SEQ to increment the sFlow drops counters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds good.

@rajkumar38
Copy link
Contributor Author

"sample_direction":"both"

SAI spec allows both ingress and egress sample-rate to be different.
Also your proposal deviates from open-config model (openconfig allows ingress-sample-rate and egress-sample-rate).
Hence we want to give same flexibility in SONiC.

@pphaal
Copy link

pphaal commented Mar 6, 2023

"sample_direction":"both"

SAI spec allows both ingress and egress sample-rate to be different. Also your proposal deviates from open-config model (openconfig allows ingress-sample-rate and egress-sample-rate). Hence we want to give same flexibility in SONiC.

The sFlow HLD is an implementation the sFlow specification, which requires symmetric sampling rates when performing bi-directional sampling.

The CLI and ConfigDB schema must faithfully implement the sFlow specification. Permitting asymmetric sampling to be configured puts the agent out of complience and will result in erroneous traffic data being reported by sFlow analyzers. The proposed, "sampling_direction":"both", ensures compliance with the sFlow specification, is simpler to implement, requires fewer HLD changes, and minimizes backward compatibility issues with implementations of the current HLD.

The sFlow agent abstracts the low level features of the ASIC exposed by SAI in order to implement the sFlow data model. The sFlow agent will consume SAI's ability to set ingress-sample-rate and egress-sample-rate in order to implement symmetric sampling, but sFlow configuration model protects the user from misconfiguring SAI.

The open-config model for sFlow configuration should not be used as a reference since it is flawed (for the reasons described above). Instead, the open-config model should be changed to comply with the sFlow specification.

@vadymhlushko-mlnx
Copy link
Contributor

Hi rajkumar38,

While sFlow supports bidirectional packet sampling, it requires that the sampling rate be the same in each direction, i.e. ingress_sampling_rate == egress_sampling_rate for bidirectionall sampled interfaces.

I would suggest an option 3 that would simplify the proposal and minimize issues with backward compatibility:

"SFLOW_SESSION": {
    "Ethernet0": {
       "admin_state": "down",
       "sample_rate": "40000",
       "sample_direction":"both"           
    },

The sflow sample-rate setting should remain unchanged. Instead, a new CLI settings should be added to configure direction, i.e.: sflow interface direction {rx|tx|both} Decoupling sampling rate and sampling direction in the CLI is simpler and leaving the existing sampling-rate command unchanged simplifies forward migration. If unspecified, the default direction should be "rx" for backward compatibility (and because this is the most common setting).

The "show sflow" command should also minimize changes, i.e. first three columns should remain the same, a fourth column could be added to indicate direction:

Interface     Admin Status   Sampling rate  Sampling direction
---------     ------------   -------------      --------------
Ethernet0     down           40000                   both

Similarly the DB schema can be extended with a SAMPLE_DIRECTION attribute added to SFLOW_SESSION_TABLE. When migrating from an older SONiC version, the default sampling direction should be set to "rx"

Regards, Peter Phaal, InMon Corp.

Agree with Peter Phaal. This approach is cleaner & simple than in the proposal.

@rajkumar38
Copy link
Contributor Author

rajkumar38 commented Mar 10, 2023

@pphaal @vadymhlushko-mlnx
Thanks for the inputs. We agree to this option3. I will update the HLD accordingly. In addition, we would like to introduce new "direction" command at global level. If not set default is "rx" for backward compatibility. On upgrade, db_migrator will set this to "rx" in CFG_SFLOW_TABLE and APP_SFLOW_TABLE_NAME . This allows operator to retain previous behavior like before upgrade.

config sflow sample-direction { rx | tx | both }

@rajkumar38
Copy link
Contributor Author

@pphaal @venkatmahalingam @shishao7sxm @jeff-yin
hi, Updated HLD with new option3 (preferred option) and address review comments.
Pls check and revert for any comments.

@pphaal
Copy link

pphaal commented Apr 13, 2023

Depending on the capabilities of the ASIC/driver, egress samples may be able to expose additional PSAMPLE attributes. I think it would be helpful to point this out by adding a paragraph after the paragraph discussing PSAMPLE_ATTR_SAMPLE_GROUP in section 6.9 - not to propose a change, but to raise awareness. Something along the lines:

Depending on hardware capabilities, driver may report additional attributes defined in https://github.com/torvalds/linux/blob/master/include/uapi/linux/psample.h. For example, PSAMPLE_ATTR_OUT_TC (egress queue), PSAMPLE_ATTR_OUT_TC_OCC (egress queue depth), and PSAMPLE_ATTR_LATENCY (transit delay) populate the sFlow Transit Delay Structures (https://sflow.org/sflow_transit.txt).

@rajkumar38
Copy link
Contributor Author

Depending on the capabilities of the ASIC/driver, egress samples may be able to expose additional PSAMPLE attributes. I think it would be helpful to point this out by adding a paragraph after the paragraph discussing PSAMPLE_ATTR_SAMPLE_GROUP in section 6.9 - not to propose a change, but to raise awareness. Something along the lines:

Depending on hardware capabilities, driver may report additional attributes defined in https://github.com/torvalds/linux/blob/master/include/uapi/linux/psample.h. For example, PSAMPLE_ATTR_OUT_TC (egress queue), PSAMPLE_ATTR_OUT_TC_OCC (egress queue depth), and PSAMPLE_ATTR_LATENCY (transit delay) populate the sFlow Transit Delay Structures (https://sflow.org/sflow_transit.txt).

Updated the HLD. Pls check

@pphaal
Copy link

pphaal commented Apr 20, 2023

Depending on the capabilities of the ASIC/driver, egress samples may be able to expose additional PSAMPLE attributes. I think it would be helpful to point this out by adding a paragraph after the paragraph discussing PSAMPLE_ATTR_SAMPLE_GROUP in section 6.9 - not to propose a change, but to raise awareness. Something along the lines:
Depending on hardware capabilities, driver may report additional attributes defined in https://github.com/torvalds/linux/blob/master/include/uapi/linux/psample.h. For example, PSAMPLE_ATTR_OUT_TC (egress queue), PSAMPLE_ATTR_OUT_TC_OCC (egress queue depth), and PSAMPLE_ATTR_LATENCY (transit delay) populate the sFlow Transit Delay Structures (https://sflow.org/sflow_transit.txt).

Updated the HLD. Pls check

Looks good. Thanks.

@rajkumar38
Copy link
Contributor Author

@pphaal @vadymhlushko-mlnx @jeff-yin @venkatmahalingam
If no other comments, can you please approve this HLD PR ?

@zhangyanzhao
Copy link
Collaborator

@shishao7sxm can you please help to review this PR? Alibaba registered this PR to review. Thanks.

@rajkumar38
Copy link
Contributor Author

@shishao7sxm @zhangyanzhao
If no further comments, can you help to merge this PR.

Updated existing sFlow HLD for egress Sflow support.

Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
@rck-innovium rck-innovium self-assigned this Jun 30, 2023
@zhangyanzhao zhangyanzhao merged commit 5b6f042 into sonic-net:master Aug 31, 2023
1 check passed
qiluo-msft pushed a commit to sonic-net/sonic-utilities that referenced this pull request Oct 11, 2023
### What I did
Added egress Sflow support.
HLD : sonic-net/SONiC#1268
#### How I did it
New commands and migrator scripts are updated as per HLD.
#### How to verify it
Run UT and validate the commands.
#### Previous command output (if the output of a command-line utility has changed)
```
show sflow
sFlow Global Information:
  sFlow Admin State:          up
  sFlow Polling Interval:     0
  sFlow AgentID:              default

  2 Collectors configured:
    Name: prod                IP addr: fe80::6e82:6aff:fe1e:cd8e UDP port: 6343   VRF: mgmt
    Name: ser5                IP addr: 172.21.35.15    UDP port: 6343   VRF: default

show sflow interface
+-------------+---------------+-----------------+
| Interface   | Admin State   |   Sampling Rate |
+=============+===============+=================+
| Ethernet0   | up            |            2500 |
+-------------+---------------+-----------------+
| Ethernet4   | up            |            1000 |
+-------------+---------------+-----------------+
| Ethernet112 | up            |            1000 |
+-------------+---------------+-----------------+
| Ethernet116 | up            |            5000 |
+-------------+---------------+-----------------+

````
#### New command output (if the output of a command-line utility has changed)
```
show sflow
sFlow Global Information:
  sFlow Admin State:          up
  sFlow Sample Direction:     both
  sFlow Polling Interval:     0
  sFlow AgentID:              default

  2 Collectors configured:
    Name: prod                IP addr: fe80::6e82:6aff:fe1e:cd8e UDP port: 6343   VRF: mgmt
    Name: ser5                IP addr: 172.21.35.15    UDP port: 6343   VRF: default

show sflow interface

+-------------+---------------+-----------------+----------------------+
| Interface   | Admin State   |   Sampling Rate | Sampling Direction   |
+=============+===============+=================+======================+
| Ethernet0   | up            |            2500 | both                 |
+-------------+---------------+-----------------+----------------------+
| Ethernet4   | up            |            1000 | tx                   |
+-------------+---------------+-----------------+----------------------+
| Ethernet112 | up            |            1000 | both                 |
+-------------+---------------+-----------------+----------------------+
| Ethernet116 | up            |            5000 | rx                   |
+-------------+---------------+-----------------+----------------------+

```
@skg-net
Copy link
Member

skg-net commented Feb 6, 2024

@rck-innovium Can you please update the Quality Metric (Alpha/Beta/GA) for the feature either in this PR comments or in HLD itself based on https://github.com/sonic-net/SONiC/blob/master/doc/SONiC%20feature%20quality%20definition.md
Thanks

@rck-innovium
Copy link
Collaborator

@rck-innovium Can you please update the Quality Metric (Alpha/Beta/GA) for the feature either in this PR comments or in HLD itself based on https://github.com/sonic-net/SONiC/blob/master/doc/SONiC%20feature%20quality%20definition.md Thanks

Quality metric: Beta

Test plan has been reviewed and partially implemented: https://github.com/sonic-net/sonic-mgmt/pull/8275/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet