שם המשרה | פתיחת משרה | פקולטה / אגף | |
---|---|---|---|
DevOps position in the High Performance Computing Section | 07/03/2023 |
תחום מחשוב עתיר ביצועים בענף תשתיות מחשוב באגף טכנולוגיות מידע
|
|
מספר משרה:
62646
תיאור המשרה:
We are seeking to recruit a senior, highly motivated DevOps professional with at least 3 years’ experience in the GPU/AI operations fields, to play a key role in HPC/AI/Hybrid cloud systems operation and evaluation of new technologies to support frontier research activities of WIS scientists. This individual will be part of a group that design & build HPC/AI/Cloud solutions, ensuring that upgrades and changes comply with product/projects management guidelines. He/she will work under the head of HPC section supervising for the planning and development of a robust and scalable infrastructure for AI/ML/DL workloads, DL/ML frameworks integration and application profiling, researchers support.
השכלה וכישורים נדרשים:
Required skills: * B.A./B.Sc in information technology or equivalent academic degree. * Experience with GPU technologies and AI/ML/DL frameworks like Tensorflow, Mxnet, Pytorch, Keras. * Experience supporting centralized systems, at the core of the data center. * Familiarity and experience with systems performance analysis, benchmarking of standalone machines and HPC clusters, GPU workloads. * Strong shell scripting knowledge, experience installing and maintaining clustered environments, including automated installation, patches updates and monitoring methods (Chef, Jenkins, Puppet, Ansible). * Containers automation and orchestration (experience with Dockers, Kubernetes). * Service/Customer oriented attitude. * Strong troubleshooting skills. * Strong interpersonal and communication skills. * Ability to work as a team player. * Proactive and solution-oriented problem solver. Desired skills: * Experience working with public cloud service providers – AWS, GCP, Azure. * M.Sc degree in information technology is an advantage. * Experience with any of below HPC schedulers (Slurm, SGE, Torque/PBS, LSF or alike). * Experience with CI/CD in complex distributed systems. * Documenting system administration procedures for routine and complex tasks. * Knowledge of storage operation – parallel filesystem performance oriented (GPFS, Lustre, OrangeFS, BeeGFS) * Experience with Infiniband technology. |