Required Skills
About the Job
Wells Fargo is seeking an Observability Subject Matter Expert (SME) to design, build, and evolve enterprise-scale distributed tracing and Application Performance Monitoring (APM) platforms. This role involves deep work with tracing products like SPLOC, AppDynamics, and Dynatrace, applying AI/ML and agentic architectures for autonomous observability, root-cause analysis, and self-healing capabilities.
Key Responsibilities: * Architect and implement end-to-end distributed tracing solutions across microservices and monolithic applications. * Lead the onboarding and instrumentation of applications using Open Telemetry and proprietary APM agents. * Design scalable architectures for high-volume trace ingestion, storage, correlation, and visualization. * Define standards for traces, spans, service maps, and SLIs/SLOs. * Design and build AI-powered observability solutions for anomaly detection, pattern discovery, and closed-loop automation. * Integrate LLMs and ML models with observability platforms for natural-language exploration and insights. * Enable platform extensibility through custom plugins, APIs, and data pipelines. * Ensure solutions are secure, compliant, resilient, and cost-efficient.
Required Qualifications: * 5+ years of Software Engineering experience.
Desired Qualifications: * 8+ years of experience in software or platform engineering, with a focus on observability or APM. * Hands-on experience with distributed tracing and APM platforms (SPLOC, AppDynamics, Dynatrace, Grafana). * Deep understanding of microservices, cloud-native architectures, and distributed systems. * Proficiency in Python, Java, or Go. * Practical experience applying AI/ML or LLMs to production systems. * Experience designing scalable, reliable platforms. * Experience with Prometheus-based metrics and Grafana on Kubernetes/OpenShift. * Experience building or integrating agentic AI systems. * Strong analytical, problem-solving, and communication skills.