Principal, AI Platform Engineering
Company: Ares Operations
Location: New York City
Posted on: April 1, 2026
|
|
|
Job Description:
Over the last 20 years, Ares’ success has been driven by our
people and our culture. Today, our team is guided by our core
values – Collaborative, Responsible, Entrepreneurial, Self-Aware,
Trustworthy – and our purpose to be a catalyst for shared
prosperity and a better future. Through our recruitment, career
development and employee-focused programming, we are committed to
fostering a welcoming and inclusive work environment where
high-performance talent of diverse backgrounds, experiences, and
perspectives can build careers within this exciting and growing
industry. Job Description Overview We are seeking an exceptional
Principal AI Platform Engineer to design and build an
enterprise-grade generative AI platform from the ground up. This is
a leadership role that combines deep technical expertise in AI
systems architecture with the strategic vision to shape how our
organization scales AI capabilities across all business domains.
You will architect a comprehensive platform spanning model
gateways, retrieval services, model registries, prompt libraries,
and deployment pipelines—enabling teams across the firm to build,
deploy, and operationalize AI applications with confidence,
compliance, and security. Key Responsibilities Platform
Architecture & Design Design and build a foundational AI platform
that enables secure, scalable, and compliant generative AI across
the enterprise Architect multi-LLM gateway capabilities to support
diverse model providers, allowing teams to leverage best-of-breed
models for different use cases Establish platform standards and
patterns that balance flexibility, safety, governance, and
performance Core Platform Components Develop multi-LLM gateway:
unified interface for accessing multiple LLM providers with load
balancing, fallback handling, and cost optimization Build RAG
(Retrieval-Augmented Generation) retrieval services: enterprise
search, semantic indexing, and document retrieval at scale Create
model registry and governance: centralized catalog of models,
versions, fine-tuning metadata, performance metrics, and compliance
tracking Design prompt library and version control: organizational
repository for prompts with testing, evaluation, and A/B testing
capabilities Implement Model Context Protocol (MCP) gateway: enable
secure integration between AI applications and external tools,
APIs, and data sources Build FinOps infrastructure: cost tracking,
optimization, and allocation across models, usage patterns, and
business units Agent-to-Agent (A2A) Workflows Design orchestration
framework for complex, multi-step AI workflows across applications
Enable reliable, scalable execution of chained AI operations with
state management and error recovery Integrate with broader data
ecosystem for workflow triggers and data pipelines Data Gateway
Integration Partner with data platform teams to design AI-native
data access patterns Enable secure, governed access to enterprise
data and RAG and model training Build metadata and lineage tracking
for AI-consumed data Deployment & DevOps Design
sandbox-to-production pipelines: safe, repeatable processes for
testing and deploying AI applications Implement CI/CD for AI
models: versioning, testing, promotion, and rollback capabilities
Build observability and monitoring: telemetry, performance metrics,
cost tracking, and compliance auditing Establish disaster recovery
and high-availability patterns Collaboration & Enablement Work
closely with Data Products team to align platform capabilities with
data governance and analytics infrastructure Partner with AI
Enablement teams to provide tools, SDKs, documentation, and best
practices that democratize AI development Lead technical
discussions on platform strategy, roadmap, and trade-offs across
the organization Build internal developer experience and platform
adoption Security Architecture & Implementation Design and
implement comprehensive security architecture aligned with firm
cyber and information security guidelines Build authentication and
authorization frameworks: role-based access control (RBAC),
attribute-based access control (ABAC), and service-to-service
authentication Implement encryption standards: encryption at rest
(AES-256 or equivalent) and in transit (TLS 1.2) for all sensitive
data Design secure API gateways and service boundaries with rate
limiting, request validation, and DDoS protection Implement secrets
management: secure storage and rotation of credentials, API keys,
and certificates Build comprehensive audit logging and monitoring:
all access, modifications, and security events logged with
immutable audit trails Partner with Infosec and Security Operations
to implement continuous security monitoring and threat detection
Governance, Compliance & Risk Management Ensure platform compliance
with regulatory requirements: SOC 2 Type II, data residency, and
audit trails Implement data governance: classify data sensitivity
levels, enforce data handling policies, and ensure appropriate
access controls Build model governance: track model provenance,
versioning, training data lineage, and approval workflows for
production deployment Prevent data exfiltration and prompt
injection attacks through input validation, output filtering, and
rate limiting Establish responsible AI practices: bias detection,
fairness assessment, and explainability requirements Manage
third-party vendor security: assess LLM provider security postures,
data processing agreements, and compliance certifications Create
model risk assessment framework: evaluate models for regulatory,
market, and operational risks before production deployment Work
with Compliance, Legal, and Risk teams to ensure platform meets all
governance requirements and documentation standards Required
Qualifications 10 years of software engineering experience, with 5
years building large-scale, distributed systems or platform
infrastructure 3 years of hands-on experience with generative AI,
LLMs, RAG systems, or AI infrastructure—either in production
systems or applied research Deep expertise in one or more: Python,
Go, Rust, or Java; experience building APIs and orchestration
systems Strong understanding of LLM architectures, prompting
strategies, fine-tuning, and RAG design patterns Demonstrated
experience with: model serving (vLLM, Ollama, TensorFlow Serving),
vector databases, and embedding models Proficiency in cloud
platforms (AWS, GCP, Azure) and containerization/orchestration
(Docker, Kubernetes) Experience designing and building
multi-tenant, secure platform systems with strong governance and
observability Demonstrated expertise in security: architecture,
secure coding practices, authentication/authorization, encryption,
and threat modeling Experience with compliance frameworks and
security certifications: SOC 2, ISO 27001, GDPR, or similar Track
record of leading technical initiatives from architecture through
production deployment Excellent communication skills; ability to
explain complex technical and security concepts to executives and
cross-functional teams Preferred Qualifications Experience in
financial services, private equity, or alternative assets
technology environments Familiarity with LangChain, LlamaIndex, or
similar AI orchestration frameworks Experience with MLOps tools and
practices: model versioning, feature stores, experiment tracking
Knowledge of eval frameworks, retrieval evaluation, or AI model
benchmarking Experience with data governance platforms or metadata
management systems Experience building zero-trust architectures or
implementing security controls in cloud-native environments
Contributions to open-source AI/ML projects or publications in the
AI/ML space Experience in building developer platforms or internal
tools that drive organizational adoption Reporting Relationships
Partner, Chief Information Officer Compensation The anticipated
base salary range for this position is listed below. Total
compensation may also include a discretionary performance-based
bonus. Note, the range takes into account a broad spectrum of
qualifications, including, but not limited to, years of relevant
work experience, education, and other relevant qualifications
specific to the role. $300,000 - $350,000 The firm also offers
robust Benefits offerings. Ares U.S. Core Benefits include
Comprehensive Medical/Rx, Dental and Vision plans; 401(k) program
with company match; Flexible Savings Accounts (FSA); Healthcare
Savings Accounts (HSA) with company contribution; Basic and
Voluntary Life Insurance; Long-Term Disability (LTD) and Short-Term
Disability (STD) insurance; Employee Assistance Program (EAP), and
Commuter Benefits plan for parking and transit. Ares offers a
number of additional benefits including access to a world-class
medical advisory team, a mental health app that includes coaching,
therapy and psychiatry, a mindfulness and wellbeing app, financial
wellness benefit that includes access to a financial advisor, new
parent leave, reproductive and adoption assistance, emergency
backup care, matching gift program, education sponsorship program,
and much more. There is no set deadline to apply for this job
opportunity. Applications will be accepted on an ongoing basis
until the search is no longer active.
Keywords: Ares Operations, West Orange , Principal, AI Platform Engineering, IT / Software / Systems , New York City, New Jersey