Hebbe to CentOS7 - what will change for users?

Background

The operating system for Hebbe is quite old by now (we first started using it for Glenn), we also need to replace Cstor since no good solution for extending the support contract have been found.

Considerations

  • Running Cstor without a support contract, while still having a contract with SNIC to run Hebbe during 2020 is not an option
  • The replacement for Cstor, Cephyr, does not have good enough support for CentOS 6
  • Since CentOS 8 came out this summer, we could have gone to CentOS 8 instead. After some investigation we abandoned this path due to lack of support for a few of the HPC-specific tools etc. we need (filesystems etc.)
  • We will need to replace Cstor (serving /c3se) during the spring, we decided to move users home-directories over to Cephyr (serving /cephyr) at the same time as the OS upgrade to minimize the number of interruptions for the users

So what will change for the user?

  • New OS (CentOS 7) with a more modern kernel, tools etc.
  • New login servers: hebbe1.c3se.chalmers.se, hebbe2.c3se.chalmers.se replaces hebbe.c3se.chalmers.se
    • Thinlinc will now connect directly to hebbe1 or hebbe2
    • For Mstud, hebbe-mstud.c3se.chalmers.se (and their nodes) works as before.
  • New home directory on Cephyr (/cephyr/users/$CID/Hebbe)
    • 30GiB, 60k files (backed up)
    • $SNIC_NOBACKUP area will be deprecated. Additional storage on Cephyr must be applied for via storage projects! See https://supr.snic.se/round/storage/
    • The old $SNIC_NOBACKUP on /c3se/NOBACKUP/users/$CID/ will be removed later in 2020. More information on this will come in early 2020. You will be able to access this old area from new Hebbe.
    • Existing storage projects on CStor will be given new area on Cephyr in the near future, if applicable.
    • No automatic transfer of old data: users must move what they want to keep off the old area! (And please delete stuff no longer needed!)
    • Home directory of Vera will also migrate to this area in the near future (/cephyr/users/$CID/Vera). The area is already reachable from Vera
    • The quota is shared for both Hebbe and Vera $HOME, in the same way as now for Cstor
    • Quota is enforced directly by the OS, i.e. jobs will be prevented from writing over the quota
  • New module tree
    • Most used toolchains and software in place now
    • Old tree still accessible by loading old environment, but software not guaranteed to work
      • source /apps/old_c6_modules.sh to switch to the old software tree.
      • source /apps/new_c7_modules.sh to switch back to the new software tree.
      • When the migration to the new system is complete, no more software will ever be added to the old tree.
    • User built software may need recompilation
    • Not all software and / or versions will be re-compiled for the new tree, please update your versions if possible, and get back to us when this isn’t possible
    • New instructions for job submission for GPU nodes when they are moved to the new system (similar to Vera)
    • Projects and partitions will remain the same as on current Hebbe

Procedure

  • We have set up servers and user a few nodes for testing for some time
  • hebbe2.c3se.chalmers.se is in place and ready for users together with 10 nodes.
  • We will now let users in, to get a feel for the new environment and check their software, scripts etc.
  • Machines will be gradually migrated from the old system to the new, slow in the beginning but quite aggressively towards the end
  • Private partitions will be migrated after a time have been agreed upon with the owner (but not later than the last general nodes!)
  • When all nodes have been moved, hebbe.c3se.chalmers.se will be shut down and hebbe-gui will become hebbe1
  • During Q1 2020 we will start moving project areas over to Cephyr, as well as shut down private storage areas on Cstor!
  • During Q1 2020 we will remove all $SNIC_NOBACKUP areas on Cstor, if you need additional storage it should be applied for by a project PI!
  • During Q2 2020 all remaining data will be removed from Cstor