Tabnine's private and protected Universal models

Tabnine’s Universal models: State-of-the-art, permissive, open source LLMs

These models are the result of our involvement in the open source LLM community.

We're regular contributors to many of the main open source LLM models. This gives us an advantage when selecting the best ones out there.

Our clients trust us to curate every option in the market for them. We assure them that each time we update the Universal model, it includes the latest and greatest. Being good open source citizens also allows us to have our proprietary add-on ready for any model. Our enhancement is model-agnostic.

Tabnine’s Universal models are trained on open source code with permissive licenses

Tabnine’s generative AI only uses open source code with permissive licenses for our Universal model trained on public code.

The licenses are as follows: MIT, MIT-0, Apache-2.0, BSD-2-Clause, BSD-3-Clause, Unlicense, CC0-1.0, CC-BY-3.0, CC-BY-4.0, RSA-MD, 0BSD, WTFPL, ISC.

We attribute all code suggested to the full list of codebases used for training our model. This list is publicly available and regularly updated in our Trust Center.

We can guarantee that at the time of training the model, the licenses used belong to the above list only. We never train on data that isn't permissive to ensure that our clients aren’t at risk.

This decision has painful implications for Tabnine since it limits the amount of data with which our models can be trained. But our users’ peace of mind comes first, at any cost. Using only permissively licensed code, we guarantee our users can safely use the code that Tabnine generates in commercial projects without any license compliance uncertainty. Moreover, we're good open source users. By training our model on code with permissive licenses only, we adhere to and fully respect the original intent of those who contributed code to open source.

This applies to all the ways you can use Tabnine: code completions or chat, for any deployment (secure SaaS or private installations), and for any model (Universal or private model).

With regards to code privacy, the same principles still apply to Chat. We use a data set licensed as Apache-2, free from copyrighted materials. This, again, makes all Tabnine Chat suggestions safe to use.

For Chat we additionally use the English data set. If the model happens to recommend code that was already in its training data, Tabnine users don't need to worry. The code is under a permissive license, making Tabnine's code suggestions safe to use.

Last updated