Skip to content

Meeting 2020 02

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

Open MPI Developer's Face-to-Face Meeting

Piggybacking on an MPI Forum meeting in order to reduce travel for people.

Location: Portland, Oregon (US Pacific time zone)
Overall dates: Feb 17-18, 2020

  • Open MPI
    • Mon, Feb 17: 2pm-6pm
      • Please don't show up before 1pm. Thanks!
    • Tue, Feb 18: 9am-2pm (or up to 6pm, if we need it)
  • MPI Forum

Specific location

Cisco Portland office (on the 3rd floor)

For reference: route from the Cisco Portland office to the Microsoft office where the MPI Forum meeting is occurring.

Attendance

Please put your name and affiliation down so that Cisco can register you as a guest.

  1. Jeff Squyres, Cisco
  2. Brian Barrett, AWS
  3. William Zhang, AWS
  4. Ralph Castain, Intel
  5. Thomas Naughton, ORNL
  6. Josh Hursey, IBM
  7. Austen Lauria, IBM
  8. Brice Goglin, INRIA
  9. Howard Pritchard, LANL
  10. Artem Polyakov, Mellanox [only Tuesday, (Feb, 18)]
  11. Shinji Sumimoto, Fujitsu
  12. Takafumi Nose, Fujitsu
  13. Nathan Hjelm
  14. Akshay Venkatesh, NVIDIA
  15. George Bosilca, UTK

Agenda

Meeting Notes

  • PMIx common library triplet issue
  • We screwed up by not changing the .so number of the common lib when we changed the number of the main lib
  • We will get it right going forward, starting with v4.x
  • Will not adjust it mid-stream for v3.1.5
  • Also need to discuss PRRTE vs. ORTE (e.g., proposal to effectively re-create PRRTE in ORTE ourselves)
  • Dropped - PRRTE has already replaced ORTE
  • MTT Database cleanup:
    • How much history do we need to keep?
    • Archive the tables or just drop them?
  • Hitting storage limits on server. Currently been dropping oldest data to stay under limit.
  • Previous proposal was to keep it around in S3, but do we really need it?
  • Anyone wanting to keep older stuff can export it and store it on their own (few hundred Mbytes/year)
  • Keep max of 12 months
* Python Client transition?  Current status.
  • In production at Intel and LANL
  • Lack of motivation and time for others to transition
  • Shall we set a minimum supported version for PMIx, as we do for hwloc, libevent, and autotools?
  • Kind of need to, but what is the level for v5?
    • Definitely not going back to v1.x
    • Definitely supporting v3.x and above
    • Ralph: will build PRRTE vs PMIx v2.x to see how big a job it would be to make PRRTE work with it
      • Conclusion Supporting PMIx v2.x is not possible as that version doesn't include support for IOF. Resolving this would require prun to become a PRRTE program (i.e., it would have to call prrte_init) instead of being a PMIx program, breaking one of the architectural goals. This could be changed but would require some thought/effort.
  • Slurm is a sticking point
    • 2016 only supports v1.x
    • 2017 only supports v2.x and v1.x
    • 2018 supports v3 and above (configure issue that checked for equality fixed)
    • No option for PMI-1 or PMI-2 any more starting with OMPI v5
    • Ralph: Ask Danny about posting the Slurm 2017 patch updating PMIx plugin configure to accept PMIx v3 and above libraries, ask Slurm mailing list if anyone cares
      • Danny: Slurm 17.02 and 17.11 are out of support. Slurm 18.02 will fall out of support this month. SchedMD only supports back two years on releases, so don't worry about the 17.x releases
    • OMPI Web page: recommend updating Slurm to 2018 or above for better PMIx support
  • Two separate limits to be identified:
    • OMPI direct launch requires PMIx v2.2 or above
      • Someone could potentially backport to v1.2 if they really want to, but may lose some MPI capabilities (defer to ask Mellanox tomorrow)
    • OMPI mpirun launch requires PMIx v3.x and above
  • From the ORTE/PRRTE work, Ralph raises a question:
    • Do we still want to claim the mca_ prefix/"namespace"?
    • We already claim MPI_, opal_, orte_ (soon to die, but mentioned anyway), ompi_, oshmem_, prrte_ -- do we really still need mca_?
    • Does it cause logistical problems for projects that use parts of the OMPI code base and not others, but then are used in MPI applications? (e.g., PRRTE had an mca_ namespace conflict)
  • Probably should prefix the mca_ names with opal_mca_
  • What about MCA component names (i.e., the symbol we look in for the MCA component libraries)?
    • mca_btl_tcp_component ---> opal_btl_tcp_component
    • Create a macro that automatically creates the symbol name? (Nathan)
  • Conclusion: we should give up the mca_ namespace and replace it with the project
  • Someone needs to track down and document somewhere the reason why we dlopen these libraries to GLOBAL scope
    • Need a solid proposal for changing MCA parameter format
      • If there is no ambiguity (i.e., all frameworks have unique names across projects), then magically make the correlation under-the-covers
      • If there is ambiguity, generate an error explaining that the user needs to prefix it with the project name to resolve the conflict
      • Other details? Await the proposal
  • Stagnant command line/MCA param options
    • Our code base is plagued with options that someone thought would be interesting a long time ago. However, the person who developed it has long departed the project, or did it in grad school and has no intention of supporting it. This legacy code is a major burden to those of us left to deal with it. It is frequently scattered over multiple places in the code base, and almost always brittle. We need a way of dealing with this stuff - something better than "well it works now, so don't break it".
    • New code should be self-contained, not scattered over the code base.
    • When someone leaves the project or no longer wants to maintain something, an assessment should be made as to the value of the option, and a volunteer must step forward to maintain it
    • Anything failing to have a clearly identified maintainer shall be removed, regardless of how "cute" or "widely used" it may be.
    • Developers working on PRs that might impact a code should make an effort to resolve any induced problems. This includes requesting assistance from the identified maintainer. If that maintainer doesn't choose to assist, then the community should consider removing the option to clear the way for progress.
  • Proposal: PRs impacting community code shall be reviewed by at least one community (person from outside the proposing organization)
  • Ensure broader understanding of the code to gain broader capability of support, shared understanding of how it works
  • Provide an opportunity to ensure code is properly contained into an appropriate area of the repo (i.e., don't hit all over the code, but try to keep the code modularized)
  • Action: Announce intent to go this direction later this week, give those not attending a chance to chime in
  • Git submodule status
    • How are we doing? How's it going?
    • Can we submodule libevent? Or do we have any custom mods such that we can't just refer to an upstream hash?
    • One negative observation: given the length of time that OMPI CI now takes (Mellanox CI takes > 74min to complete), working on the code referenced via a submodule is agonizing as you cannot directly edit it in OMPI - you have to first make a change in the upstream repo, run that thru its CI, commit it, and then update the OMPI reference pointer...and wait over an hour to see if you actually fixed the problem. Development/bug fixes are reduced to a snail's pace.
  • Look at ways to improve CI cycle time (subject of other bullet)
  • Libevent - no real motivation to switch to submodule at this time. If someone finds a motivation, please put forward a proposal
  • Issue about libevent and hwloc sucked up into opal statically, but pmix is linked against libopal to get these components. Even with name shifting (under opal names) it can call down into opal.
    • Ralph noticed that in pmix_error_log, found himself in opal_output with an uninitialized hostname (SEGV).
    • Need a way to link directly to pmix, hwloc.
    • disable dlopen case?
    • How should we split these out?
      • Makelibtool convenience libraries of them?
      • prrte rather than linking against libtool, links against convenience libraries.
      • convenience libraries then just get sucked into the code.
      • Where this approach fails is that you can't link against both these convenience libraries and libopal?
    • configury - doesn't prrte need to know if we're linking embedded or external?
  • Move hwloc, libevent, and PMIx to the top-level directory so they are no longer inside of MCA
  • Brian and Ralph will work out a way for each to generate a library that can be slurped up by the appropriate user (e.g., PRRTE)
  • Ralph has some PRRTE questions.
    • Here's a question about: what should the default of mpirun (i.e., prun) be w.r.t. the DVM? Here's an image to help understand the question.
  • Have "prun" quickly check for persistent PRRTE available to this user
    • Report and error out if multiple found
    • Use if single one is found unless cmd line option precludes
  • Do not have "mpirun" support connection to system environment server for launch
* How to deal with PRRTE vs OMPI cmd line differences
  • Schizo can only go so far
    • allows us to add options for OMPI but cannot remove options otherwise used by PRRTE
      • example: --system-server-first
    • add ability for schizo to "prune" options from other components?
      • Ralph will update schizo to ensure "mpirun" only sees OMPI-appropriate options
    • emit deprecation warnings for old options?
      • Really need to do this as this will be major impact to users
      • Jeff will provide cmd-line deprecation warning infrastructure, if needed
        • Josh/Ralph to explore implementation and determine if this is desirable
      • Josh will cross-connect the old cmd line options with the infrastructure
    • deprecate - output and continue? output and error out?
      • Brian will check with ISVs about how this impacts them
      • How to deal with someone who waits for a week to get their allocation and then immediately errors out
        • George: add "dryrun" option to test cmd line before execution?
        • Decided not to go this route for now - not clear a separate dryrun program is better
      • Brian raised concern of injecting warning messages into user's output breaking their automated results parsing
      • Decided that we will warn and continue since that is what we have been doing for MCA params for quite some time without user complaint
   * Single vs double-dash
  • Drop notion of concatenating single char options - no longer parse "-abc" as "-a -b -c". Treat it as "-abc"
  • Treat as deprecated cmd line option - warn and continue for now, reject later
  • Include PRRTE params in output from ompi_info [Jeff]
  • It is assumed that v5.0 will be a major discussion point of this meeting (Artem: can we do Tuesday?)
    • Discuss Schedule for branching, testing and release.
    • Discuss Release Managers for v5.0
      • IBM is willing to help (Geoff + Austen)
      • Mellanox-NVIDIA are volunteering as well (Josh L.)
    • See the wiki page for v5.0
    • Also keep v6.0 plans in mind (because some things are planned to sunset in v6.0, ...etc.)
  • Downside of CI (Artem: can we do Tuesday?)
    • Software community is becoming increasingly aware of the downside of CI - developers code to pass CI even when the CI itself contains an error
    • OMPI CI has been found to contain errors - tests are added by getting a "signature" of current behavior vs writing a test based on a desired behavior. As a result, we are perpetuating errors by "baking them in" via CI
    • We need to scrub our CI tests to ensure that we are testing the correct behavior vs taking a "point-in-time" signature
  • Length of CI time for a single PR -- getting far too long (75-90 minutes for Mellanox CI).
    • Can we make this shorter?
  • Josh will send note to list setting up issue to coordinate CI operations
    • Push non-context-sensitive testing to AWS
    • Let vendors run tests that are hardware/environment sensitive
  • Should help reduce time as AWS can run more in parallel
  • Support --with-external-prrte builds?
  • Consensus: not needed as you can always build PRRTE separately and run against it
  • Current minimum external hwloc version is v1.5 (enforced in opal/mca/hwloc/external/configure.m4).
  • Will not adjust as CentOS 6 still ships HWLOC v1.5
  • Current Status of Mellanox work on UCX/OMPI integration [Tuesday, (Feb, 18)].
  • Presented by Artem - will be attached
  • Yes, Nathan will remove the "sync" builtin atomics (woohoo!)
  • Update on tool vendor support for OMPI v5
  • Update on RM support for OMPI direct launch now that PMI-1/2 is gone (Artem: can we do Tuesday?)
    • Slurm PMIx support - when/how will they track? Does anyone know?
    • PBSPro?
    • Cray?
    • Others?
  • Ralph will create a new "ompi_deprecated_options.m4" that will include the "--with-pmi" option
    • Use a generic "OMPI_CHECK_DEPRECATED_OPTIONS" func
    • Output an error message and abort the config
    • Call it early in configure.ac
  • Status report on "instant on" for OFI (Artem: can we do Tuesday?)
  • Brian requesting update from Raghu
  • Impact of recent Linux change for "device renaming" (Artem: can we do Tuesday?)
    • Replaces the usual device names (mlx5_0, hfi1_0, etc.) with longer names that indicate how the device is connected
    • Already identified issues in libpsm2 and UCX MTL
    • Other places? rml/oob? OFI providers?
  • Everything in libibverbs has been modified to conform.
    • Example: this has already been back-ported to RHEL 7.7
  • Doesn't seem to affect anywhere else in Open MPI.

Update SPI on Ralph's status?

  • Yes - Ralph will send a note indicating that Jeff is now primary, Brian secondary contact
  • PRRTE
  • messaging overhead? Flagged by PMIx dynamic workflow team
  • resilience? Waiting for UTK commit?
  • Deferred due to lack of time
  • PMIx
    • Moving ownership of the Standard and OpenPMIx domains to UTK
  • Josh, George and Ralph will coordinate via email
  • Can we somehow get rid of the name "vader"?
  • Nathan will revive his "component synonym" PR
    • Remove sm component
    • Rename "vader" to "sm", alias "vader" to point to it
  • Current Status of Fujitsu MPI Development
  • Presented by Shinji - to be attached
  • Put label on issues that have no activity for some defined period of time
    • Don't apply to PRs as it might be inhospitable, passive-aggressive way of killing something
    • Perhaps flag as something requiring review
    • Warn of impending closure if no activity within given time
    • Auto-close is a non-starter
    • Thomas: quick search of "openib" turns up 70-90 issues
      • Can be closed as "won't fix" due to moving out of service, won't back port fixes to the release branches
    • William will go thru the 10-15 issues filed against TCP device selection
    • Review issues on case-by-case basis to determine what is no longer applicable, close with polite comment
  • Lock closed issues so people don't comment on them
    • Counter - sometimes people see their problem (or similar one) in old issue and comment on it, ask "how was this fixed or worked around"
    • Is it better to continue an existing thread or open a new one every time?
      • Reasonable probability of them commenting on something that isn't really relevant to the original problem
      • However, there are times it is helpful because we see that the original fix wasn't comprehensive, missed some edge case
    • Informal poll is to leave things as-is
  • Perhaps use "draft" PRs to solicit input before opening a full PR?
  • Maybe use WIP-DNM label as an option to "draft"
    • Setup Jenkins to detect removal of WIP-DNM label and trigger CI
  • How to deal with MCA variables in the common scope - common is remove last after last usage by the components themselves, means that something can still be referencing the variable after it gets removed. Small window in race condition. See #7284
  • Nathan will investigate - should have been fixed

Revisit: "tune" command line option introduced ordering sensitivity on cmd line

  • Overwriting of variables results in left-to-right evaluation precedence
  • Josh: IBM relies on passing amca file and overriding specific values on cmd line
  • Mellanox: Artem will investigate their use case - do they need order-sensitive parsing?
  • Ralph: initial sense is that we should allow override, but that explicit value on cmd line should override something in file regardless of position on the cmd line
Clone this wiki locally