Copilot platform for SRE teams  

Utilize interactive AI models to automate reliability workflows, predict outages, and improve observability.

Auto-generate SLOs

Automatically generate optimal SLOs for each service and user journey. 

Predict outages

Predict outages at the granularity of a journey, service, and application.

Generate insights using AI

Use natural language to generate custom analytic reports. 

 

OccamsHub Platform 

How OccamsHub improves SRE team productivity 

SRE Copilot

Our SRE copilot is a set of trained AI models that automatically

      1. Discovers services, user journeys, service maps, and entities and discovers the links between them

      2. Auto-generates optimal SLOs and monitors SLIs for each user journey and service

      3. Provides an interactive interface to generate SLO recommendations on hundreds of change scenarios

      4. Predicts outages at the granularity of the user journey, service and application

Trained locally using each customer’s historical data, SRE copilot provides all stakeholders a common framework to monitor and manage service reliability. This results in a measurable improvement in both reliability and SRE team productivity.

Observability Copilot

Turbocharge your observability 

      1. Use natural language queries to create custom analytics, ML models, and correlations within seconds

      2. Slice and dice logs, events, metrics, and traces using any number of high-cardinality dimensions and attributes

      3. Benchmark performance of complex microservices and identify bottlenecks with built-in AI models

      4. Create proactive workflows to address issues before they impact customers.

OccamsHub Copilot is built as a highly scalable data processing system optimized for two distinct workloads: analyzing high cardinality cloud data in real time and training AI models to automate complex SRE processes.

The Copilot automatically normalizes ops and business data into a common data model and maintains a real time map of all entity linkages, enabling SRE teams to rapidly create new analytics, discover new correlations, and train ML models. This improves productivity, allowing users to predict outages, accelerate investigations, and drive unique operational insights.

Application Performance Monitoring (APM)

Gain granular visibility and spot problems early using our out-of-the-box monitoring capabilities such as

      1. Golden signal monitoring for latency, traffic, errors, and saturation of each service

      2. An early warning system with predictive alerts

      3. Synthetic monitoring for out-of-band monitoring of services

      4. Infrastructure monitoring and forecasting

These capabilities allow application teams to fix slow or broken apps with pre-configured alerts and dashboards.

Open Telemetry Support

Open Telemetry (Otel) is an open-source project that provides a vendor-agnostic approach to instrument various applications to generate metrics, logs, and traces. It also provides a collector to receive, process and export telemetry data, removing the need to maintain multiple agents and collectors.

OccamsHub supports Open Telemetry through an OccamsHub version of the upstream Otel Collector. Refer to our Github repository. This enables our customers to  

      1. Break APM vendor lock-in for data generation and collection

      2. Switch between various analytics and monitoring backends easily

      3. Use the latest in community standards for consistency across all applications

Enterprise Scale

OccamsHub has several enterprise integrations out of the box:

      1. Multi TB/day streaming ingestion 

      2. Open telemetry support for data collection and instrumentation

      3. Bidirectional integration with ITSM tools 

      4. Single sign-on integration