ML OPS (DevOps)

EuropeCyprusRemote

5500$

  • Prohibited locations: RF, Ukraine, RB

  • English: B2

  • Years: ML - 6+ months, DevOps -3+ years commercial experience

Key Responsibilities

  • Development and support of end-to-end ML pipelines (training, validation, deployment, monitoring, retraining)

  • Construction and operation of CI/CD for models (test automation, packaging, and deployment)

  • Design of LLM/RAG pipelines, context management,

  • embedding dashboards (embedding quality/dynamics dashboards), index regeneration, prompt and fact-check testing (Grounding/citation)

  • MLOps platform setup: experiment tracking, model registry, feature store, monitoring

  • Management of ML infrastructure and environments (GPU/CPU pools, Kubernetes/EKS, Docker)

  • Implementation of deployment strategies: canary, shadow, A/B testing

  • Ensuring model quality monitoring (accuracy drift, data drift, PSI, SLO/SLA)

  • Artifact management (data, models, metadata, versions)

  • Security compliance (encryption, access control, auditing, operation in private VPCs)

  • Integrating ML models into backend services (API, gRPC, REST)

  • Collaborating with Data Engineering and Data Science teams

  • Documenting processes and best practices for ML infrastructure

  • Managing the cost and scaling of ML infrastructure in AWS

  • Data governance: storage policies (S3 lifecycle), dataset versioning (DVC/LakeFS), data lineage (OpenLineage), quality gates in CI/CD

Requirements

ML Ops Tools

  • MLflow or Kubeflow (experiments, registry)

  • Feature Store (Feast, Tecton, or custom)

  • Airflow, Prefect, or Kubeflow Pipelines (ML workflow orchestration)

Infrastructure and Containerization

  • Docker, Kubernetes/EKS

  • AWS S3, ECR, EKS, IAM, KMS, VPC

  • Terraform or Pulumi (IaC)

  • GitHub Actions, GitLab CI, or Jenkins (CI/CD)

  • Autoscaling, AWS Batch/Step Functions for offline processing and retrieval

Monitoring and Observability

  • Prometheus, Grafana, CloudWatch, CloudTrail

  • Model Quality Metrics (AUC, F1, Brier, logloss)

  • Stability metrics (drift detection, PSI)

  • LLM-specific metrics: tokens/sec, context length, prompt/response size, grounding rate, citation coverage, hallucination rate.

Key Competencies

  • Building a stable and secure ML infrastructure

  • Automation Full-cycle ML: from data to inference services

  • Quality control and stability of models in production

  • Effective collaboration with data science and data engineering teams

Joining Valletta Software Development means:

  • ๐ŸŒ A Global, Thriving Team

  • Join 100+ specialists from 20+ countries, united by a passion for outstanding

  • IT solutions.

  • ๐Ÿš€Diverse projects: Fintech, MedTech, AI/ML, e-commerce, and more. Switch

  • teams or industries to broaden your skills.

  • ๐Ÿ’ก Support at Every Step Client interview prep: We train you to succeed + give actionable feedback.

  • โœ”๏ธ Strategic stability: Well-structured processes, strong management, and long- term vision.

  • โœ”๏ธ Core values: Honesty, flexibility, innovation, and a people-first approach.

  • ๐Ÿ’ธ Regular salary review based on your personal results

  • โœจ Paid rest days and sick leaves;

Published on: 5/7/2026

Valletta Software

Valletta Softwareverified company badge

Valletta Software - custom mobile/web software developer in the US and Europe.

Website

See all 24 jobs at Valletta Software

Unlock access with PlusPlus

Please let Valletta Software know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!

Similar jobs