This is something you should know about, use, and understand how works, probably in that order.

https://twitter.com/dan_fried/status/1514265047761043456

https://sites.google.com/view/incoder-code-models

InCoder
InCoder: A Generative Model for Code In-Filling and Synthesis
Daniel Fried*,  Armen Aghajanyan*,  Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis

arXiv, 2022



Paper: https://arxiv.org/abs/2204.05999 

Demo: https://huggingface.co/spaces/facebook/incoder-demo 

Model weights and instructions: https://github.com/dpfried/incoder/blob/main/README.md 

Examples: https://sites.google.com/view/incoder-code-models/home/examples 

Inserting and completing code in a single model
We train a generative, decoder-only Transformer using a causal-masking training objective (from CM3, Aghajanyan et al. 2022) , which trains a model to generate entire code files in arbitrary orderings via masking. Here's an example where a single region is masked:


Zero-shot generation for code tasks
In inference, we can prompt our model with a document containing MASK tokens where we want it to insert code. This lets us perform a plethora of code tasks without any task-specific fine-tuning, including docstring generation, type hint prediction, variable renaming, cloze tasks, and more. Here are real outputs from our model:




See more examples here: Examples 

Trained on open-source code and StackOverflow
Unlike past work, our model's training data consists of only permissively-licensed code (Apache 2.0, MIT, BSD-2 and BSD-3 licensed) from online sources such as GitHub and GitLab, as well as StackOverflow. We focus on Python and JavaScript, but include 28 languages in total -- a total of ~200GB of data (after deduplication, filtering, and decontamination). See our paper for details.

Demo available
Demo Link: https://huggingface.co/spaces/facebook/incoder-demo 

Model available in HuggingFace's Transformers
6.7B parameter version: https://huggingface.co/facebook/incoder-6B

1.3B parameter version: https://huggingface.co/facebook/incoder-1B 



See our readme here for instructions on required versions of transformers and tokenizers, and examples of how to do infilling.

Credits
Thanks to Lucile Saulnier, Leandro von Werra, Nicolas Patry, Suraj Patil, Omar Sanseviero, and others at HuggingFace for help with the model release, and to Naman Goyal and Stephen Roller for the code our demo was based on!