DevOps position in the High Performance Computing Section

חזרה לדף הבית
שם המשרה פתיחת משרה פקולטה / אגף  
DevOps position in the High Performance Computing Section 07/03/2023
תחום מחשוב עתיר ביצועים בענף תשתיות מחשוב באגף טכנולוגיות מידע
 
מספר משרה:
62646
תיאור המשרה:

We are seeking to recruit a senior, highly motivated DevOps professional with at least 3 years’ experience in the GPU/AI operations fields, to play a key role in HPC/AI/Hybrid cloud systems operation and evaluation of new technologies to support frontier research activities of WIS scientists.

This individual will be part of a group that design & build HPC/AI/Cloud solutions, ensuring that upgrades and changes comply with product/projects management guidelines.

He/she will work under the head of HPC section supervising for the planning and development of a robust and scalable infrastructure for AI/ML/DL workloads, DL/ML frameworks integration and application profiling, researchers support.

 

השכלה וכישורים נדרשים:

Required skills:

* B.A./B.Sc in information technology or equivalent academic degree.

* Experience with GPU technologies and AI/ML/DL frameworks like Tensorflow, Mxnet, Pytorch, Keras.

* Experience supporting centralized systems, at the core of the data center.

* Familiarity and experience with systems performance analysis, benchmarking of standalone machines and HPC clusters, GPU workloads.

* Strong shell scripting knowledge, experience installing and maintaining clustered environments, including automated installation, patches updates and monitoring methods (Chef, Jenkins, Puppet, Ansible).

* Containers automation and orchestration (experience with Dockers, Kubernetes).

* Service/Customer oriented attitude.

* Strong troubleshooting skills.

* Strong interpersonal and communication skills.

* Ability to work as a team player.

* Proactive and solution-oriented problem solver.

Desired skills:

* Experience working with public cloud service providers – AWS, GCP, Azure.

* M.Sc degree in information technology is an advantage.

* Experience with any of below HPC schedulers (Slurm, SGE, Torque/PBS, LSF or alike).

* Experience with CI/CD in complex distributed systems.

* Documenting system administration procedures for routine and complex tasks.

* Knowledge of storage operation – parallel filesystem performance oriented (GPFS, Lustre, OrangeFS, BeeGFS)

* Experience with Infiniband technology.