Links
Comment on page

Monitoring Tabnine

Overview

This document will go over how Tabnine services deployed on-premise can be monitored and go over a few examples of monitoring our services locally. You can also enable Tabnine telemetry, which uses the principles shown in this document and reports the data to Tabnine’s servers.
As Tabnine’s self-hosted solution runs in a Kubernetes cluster, we rely on standard tools for our logs and metrics - logs are written to the stdout, and metrics are exposed using http endpoints in Prometheus format.
Note that as both writing logs to stdout and exposing metrics endpoints for scraping are industry standards when working in the Kubernetes ecosystem, there is an extensive collection of tools and platforms that support those formats. This document will go over the configuration options for scrapping metrics and will also provide examples for setting up a simple Prometheus server for scraping the metrics and FluentBit for the collection of the logs into a centralized endpoint.

Logs

All Tabnine services output their logs to the stdout. They are picked by and managed by Kubernetes, which allows integration with standard tools for log management and retention.
In Kubernetes, the standard way to deal with logs is to run a collection service, such as FluentD or FluentBit, which collects the logs from the pods and forwards them to a centralized location. Cloud providers usually have an official way of integrating the logs with their native logging platforms. However, they all use FluentD or FluentBit under the hood.
When Tabnine’s telemetry is enabled, we install and use FluentD to forward logs from the cluster to Tabnine’s servers.
Log messages are in the following format:
{
"timestamp": "2023-01-15T03:46:06.861Z",
"level": "error/warning/info/debug",
"message": "msg content"
}
How to send logs to an external log management system

Metrics

Tabnine services export Prometheus metrics and rely on having Prometheus Operator installed on the cluster. If you are unfamiliar with how to install a Prometheus Operator please follow Prometheus Operator install article.

Enable monitoring of metrics

In order to enable Tabnine metrics monitoring, edit the following sections in values.yaml.
global:
monitoring:
enabled: true
# labels -- by default. If your Promtheus server requires specific labels to be present for the monitors to be picked up, add them here
labels: {}
# annotations -- by default. Some platforms require specific annotations to be present, this setting will apply the annotation to all monitor objects
annotations: {}
tabnine:
telemetry:
# enabled -- Send telemetry data to Tabnine backend
enabled: false
Now that values.yaml is updated, it is time to install the chart on the cluster.
helm upgrade --install -n tabnine --create-namespace tabnine oci://registry.tabnine.com/self-hosted/tabnine-cloud --values values.yaml

Prometheus example

Values file examples

The following example adds a release=prom-example label to all PodMonitors and ServiceMonitor created by Tabnine as part of the installation.
global:
monitoring:
enabled: true
labels:
release: prom-example
image:
imagePullSecrets:
- name: regcred
tabnine:
[...]

Prometheus configuration file

The following configuration:
  1. 1.
    Scrapes only PodMonitors and ServiceMonitors with a release=prom-example label,
  2. 2.
    keeps the data for 14 days
  3. 3.
    requires 50GB of storage
  4. 4.
    requires 6G of RAM to operate
for full list of available configurations, please check the Prometheus (CRD) documentation
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prom-example
namespace: monitoring
spec:
evaluationInterval: 30s
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prom-example
portName: http-web
probeNamespaceSelector: {}
probeSelector:
matchLabels:
release: prom-example
replicas: 1
resources:
limits:
cpu: 1
memory: 6G
requests:
cpu: 1
memory: 6G
retention: 14d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
release: prom-example
scrapeInterval: 30s
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prom-example
shards: 1
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 50Gi
version: v2.42.0
Last modified 5mo ago