New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Egress Sflow Enhancement. #1268
Conversation
Hi rajkumar38, regards |
doc/sflow/sflow_hld.md
Outdated
SAMPLE_RATE = 1*7DIGIT ; average number of packets skipped before the sample is taken | ||
SAMPLE_RATE = 1*7DIGIT ; average number of packets | ||
skipped before the sample is taken | ||
direction can have value of rx | tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both direction as well here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both direction as well here?
This schema is changed with new proposal.
@@ -68,9 +68,8 @@ sFlow will be implemented in multiple phases: | |||
2. sFlow extended router support. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajkumar38 Do you know whether all 3 above phases are completed? if yes, mentioning the current status would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajkumar38 Do you know whether all 3 above phases are completed? if yes, mentioning the current status would be helpful.
To our understanding, there is no progress on phase II and III.
It is not clear if anyone else is already started working on this.
This is reviewed in sonic community and recording is https://zoom.us/rec/share/ftw0da4bBQdfjGpKWj0KpRUS-Jn0FZiiUCD2Ia3oxByheb02XoJfvsKTgYIbDrRP.MaLAuB2VyRHtcHap |
Hello all, The PSAMPLE netlink channel defines the integer attribute PSAMPLE_ATTR_SAMPLE_GROUP. The sFlow daemon on SONiC, hsflowd, expects ingress samples to appear with group=1, and egress samples to appear with group=2. Starting in version 2.0.45-1, hsflowd for SONiC will automatically be ready to accept egress samples if they appear on the netlink channel. So no need to change anything else in the sflow container if you upgrade hsflowd to that release. Regards, |
Hi rajkumar38, While sFlow supports bidirectional packet sampling, it requires that the sampling rate be the same in each direction, i.e. ingress_sampling_rate == egress_sampling_rate for bidirectionall sampled interfaces. I would suggest an option 3 that would simplify the proposal and minimize issues with backward compatibility:
The sflow sample-rate setting should remain unchanged. Instead, a new CLI settings should be added to configure direction, i.e.: The "show sflow" command should also minimize changes, i.e. first three columns should remain the same, a fourth column could be added to indicate direction:
Similarly the DB schema can be extended with a SAMPLE_DIRECTION attribute added to SFLOW_SESSION_TABLE. When migrating from an older SONiC version, the default sampling direction should be set to "rx" Regards, |
1. Egress sampling support. | ||
2. sFlow backoff mechanism (Drop the packets beyond configured CPU Queue rate limit). | ||
3. sFlow over vlan interfaces. | ||
1. sFlow backoff mechanism (Drop the packets beyond configured CPU Queue rate limit). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rajkumar38, Enabling both direction sample may double the number of packets trapped to CPU, meanwhile any drop of sample packet cause inaccuracy. Agree that backoff mechanism is too heavy, have you considered adding a monitor in sFlow container to alert sample packet loss due to Copp? That will help users to use it more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The automated sampling rate backoff mechanism described in the sFlow spec tends to cause more problems than it solves. Instead, the sFlow agent should maintain the configured sampling rates and report internal drops using the drops counter in sFlow packet sample records (defined as, number of times that the sFlow agent detected that a packet marked to be sampled was dropped due to lack of resources). If excessive drops are detected then the network operator may decide to configure a higher sampling rate to improve accuracy during peak loads.
The psample PSAMPLE_ATTR_GROUP_SEQ allows the ASIC driver to indicate to the sFlow agent that samples have been dropped/rate limited. In addition to incrementing the counter each time a psample message is sent, it should also increment the counter by the number of packet samples lost to the policer. The hsflowd agent uses discontinuitites in the PSAMPLE_ATTR_GROUP_SEQ to increment the sFlow drops counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sounds good.
SAI spec allows both ingress and egress sample-rate to be different. |
The sFlow HLD is an implementation the sFlow specification, which requires symmetric sampling rates when performing bi-directional sampling. The CLI and ConfigDB schema must faithfully implement the sFlow specification. Permitting asymmetric sampling to be configured puts the agent out of complience and will result in erroneous traffic data being reported by sFlow analyzers. The proposed, "sampling_direction":"both", ensures compliance with the sFlow specification, is simpler to implement, requires fewer HLD changes, and minimizes backward compatibility issues with implementations of the current HLD. The sFlow agent abstracts the low level features of the ASIC exposed by SAI in order to implement the sFlow data model. The sFlow agent will consume SAI's ability to set ingress-sample-rate and egress-sample-rate in order to implement symmetric sampling, but sFlow configuration model protects the user from misconfiguring SAI. The open-config model for sFlow configuration should not be used as a reference since it is flawed (for the reasons described above). Instead, the open-config model should be changed to comply with the sFlow specification. |
Agree with Peter Phaal. This approach is cleaner & simple than in the proposal. |
@pphaal @vadymhlushko-mlnx config sflow sample-direction { rx | tx | both } |
@pphaal @venkatmahalingam @shishao7sxm @jeff-yin |
Depending on the capabilities of the ASIC/driver, egress samples may be able to expose additional PSAMPLE attributes. I think it would be helpful to point this out by adding a paragraph after the paragraph discussing PSAMPLE_ATTR_SAMPLE_GROUP in section 6.9 - not to propose a change, but to raise awareness. Something along the lines: Depending on hardware capabilities, driver may report additional attributes defined in https://github.com/torvalds/linux/blob/master/include/uapi/linux/psample.h. For example, PSAMPLE_ATTR_OUT_TC (egress queue), PSAMPLE_ATTR_OUT_TC_OCC (egress queue depth), and PSAMPLE_ATTR_LATENCY (transit delay) populate the sFlow Transit Delay Structures (https://sflow.org/sflow_transit.txt). |
Updated the HLD. Pls check |
Looks good. Thanks. |
@pphaal @vadymhlushko-mlnx @jeff-yin @venkatmahalingam |
@shishao7sxm can you please help to review this PR? Alibaba registered this PR to review. Thanks. |
@shishao7sxm @zhangyanzhao |
Updated existing sFlow HLD for egress Sflow support. Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
### What I did Added egress Sflow support. HLD : sonic-net/SONiC#1268 #### How I did it New commands and migrator scripts are updated as per HLD. #### How to verify it Run UT and validate the commands. #### Previous command output (if the output of a command-line utility has changed) ``` show sflow sFlow Global Information: sFlow Admin State: up sFlow Polling Interval: 0 sFlow AgentID: default 2 Collectors configured: Name: prod IP addr: fe80::6e82:6aff:fe1e:cd8e UDP port: 6343 VRF: mgmt Name: ser5 IP addr: 172.21.35.15 UDP port: 6343 VRF: default show sflow interface +-------------+---------------+-----------------+ | Interface | Admin State | Sampling Rate | +=============+===============+=================+ | Ethernet0 | up | 2500 | +-------------+---------------+-----------------+ | Ethernet4 | up | 1000 | +-------------+---------------+-----------------+ | Ethernet112 | up | 1000 | +-------------+---------------+-----------------+ | Ethernet116 | up | 5000 | +-------------+---------------+-----------------+ ```` #### New command output (if the output of a command-line utility has changed) ``` show sflow sFlow Global Information: sFlow Admin State: up sFlow Sample Direction: both sFlow Polling Interval: 0 sFlow AgentID: default 2 Collectors configured: Name: prod IP addr: fe80::6e82:6aff:fe1e:cd8e UDP port: 6343 VRF: mgmt Name: ser5 IP addr: 172.21.35.15 UDP port: 6343 VRF: default show sflow interface +-------------+---------------+-----------------+----------------------+ | Interface | Admin State | Sampling Rate | Sampling Direction | +=============+===============+=================+======================+ | Ethernet0 | up | 2500 | both | +-------------+---------------+-----------------+----------------------+ | Ethernet4 | up | 1000 | tx | +-------------+---------------+-----------------+----------------------+ | Ethernet112 | up | 1000 | both | +-------------+---------------+-----------------+----------------------+ | Ethernet116 | up | 5000 | rx | +-------------+---------------+-----------------+----------------------+ ```
@rck-innovium Can you please update the Quality Metric (Alpha/Beta/GA) for the feature either in this PR comments or in HLD itself based on https://github.com/sonic-net/SONiC/blob/master/doc/SONiC%20feature%20quality%20definition.md |
Quality metric: Beta Test plan has been reviewed and partially implemented: https://github.com/sonic-net/sonic-mgmt/pull/8275/files |
Updated existing sFlow HLD for egress Sflow support.