System Administrator
Job Description
Our client is seeking a System Administrator to join their team on a remote basis. To be considered for this position, you will need experience maintaining and troubleshooting HPC clusters. Keep on reading to learn more.
About you
To be considered, you will need:
If hired, you will:
About you
To be considered, you will need:
- Hands-on work with HPC clusters, including hardware, image management, local networking, and schedulers.
- A strong background in troubleshooting HPC environments to resolve incidents efficiently.
- The ability to assess scientists' HPC support needs and develop task plans accordingly.
- Proficiency in building, installing, and troubleshooting applications (GNU, Intel, Fortran, Nvidia).
- Familiarity with open-source and commercial software like Python, Anaconda, Bash scripts, EasyBuild, Spack, and MPI implementations (MPICH, OpenMPI, IntelMPI, HPMPI).
- System administration skills for Linux OS, user account management, and configuration tools (Git, MS DevOps, Ansible Playbooks).
- Knowledge of RPM/DEB packages, environment modules, and ThinLinc troubleshooting.
- Expertise in job schedulers (PBS Pro/Torque, SLURM, SGE) and CUDA installations, including GPU troubleshooting.
- Hardware management, including memory upgrades, storage arrays, power and network cabling.
- Strong documentation skills to ensure knowledge continuity.
- Secret-level security clearance (or eligibility to obtain it).
If hired, you will:
- Oversee and maintain an HPC cluster, managing hardware, networking, and scheduler configurations.
- Troubleshoot HPC environments to restore operations quickly in case of incidents.
- Work with scientists to evaluate their HPC needs and develop task plans.
- Install and support applications, resolve runtime issues, and assist with in-house software.
- Manage Linux system operations, including patching, account management, and configuration via Git and Ansible.
- Support and troubleshoot job schedulers and CUDA installations.
- Handle hardware maintenance, including memory upgrades, storage management, and networking.
- Document processes and best practices to ensure knowledge continuity.
Only qualified candidates will be contacted. Please note positions may be filled prior to closing deadline. You may contact a Consultant to confirm availability.