Provenance and Attribution
Minimize IP liability of third-party models
Last updated
Minimize IP liability of third-party models
Last updated
Popular LLMs like Claude 3.5 Sonnet and GPT-4o are trained on vast amounts of data, including code that may have restrictions on how it can be used, introducing the risk of IP infringement.
With Provenance and Attribution, you can drastically reduce the risk of IP liability when using models like Anthropic’s Claude, OpenAI’s GPT-4o, and Cohere’s Command R+ for software development.
Tabnine checks the code generated within our AI chat against the publicly visible code on GitHub, flags any matches it finds, and references the source repository and its license type. This critical information makes it easier for you to review code suggestions and decide if they meet your specific requirements and policies.
This feature enables you to leverage the performance gains from powerful LLMs, and minimize the likelihood of copyleft-licensed code getting into your codebase.
Tabnine performs provenance tracing for all code generated by the AI chat. If a code block has origins in any publicly visible code on GitHub, a code provenance attribution section will appear below the block detailing the source and corresponding licenses.
Tabnine offers two levels of protection to minimize the risk of IP liability:
Training time protection: We have trained the Tabnine Protected 2 model exclusively on code that does not have any restrictions on use. This ensures that when using this model, every recommendation from Tabnine can be accepted without the risk of IP infringement. This level of protection is valuable for companies that have strict policies and zero tolerance for using nonpermissive code. This is a critical need, particularly for software companies and others who sell or relicense the code they produce.
Inference time protection: Tabnine informs you if the output of the LLM matches any publicly visible code on GitHub and identifies the source repo and its license type. By adding guardrails around Tabnine’s output, we minimize the risk of IP liability of third-party models.
The Provenance and Attribution is currently in private preview and is available to any Tabnine Enterprise customer. Existing Tabnine Enterprise customers should contact our Support team to request access. Once enabled, the Provenance and Attribution capability works on all available models: Anthropic, OpenAI, Cohere, Llama, Mistral, and Tabnine.
Supported form factors: Provenance and Attribution is currently supported for AI chat. Support for inline actions and code completions will be available in 2025.
System requirements: Provenance and Attribution requires up to 10TB of free storage.
Supported languages: Python, Java, JavaScript, TypeScript, Rust, C, C++.
Code match criteria: At least 150 characters, and multiline.
Matching index: GitHub is fully indexed, with periodic updates ensuring coverage extends beyond the model’s training data.