Sweep Next Edit
Collection
Locally running next edit autocomplete • 3 items • Updated • 5
Blog: https://blog.sweep.dev/posts/oss-next-edit
A 7B parameter model that predicts the next edit a developer will make. Given the current file, recent diffs, and cursor position, the model predicts what code block the developer will change next and how.
pip install transformers torch accelerate
python inference.py
See inference.py for a complete working example.
from inference import build_prompt, generate, FileChunk, DIFF_FORMAT
prompt, code_block, block_start, relative_cursor = build_prompt(
file_path="example.py",
file_contents=edited_contents,
cursor_position=cursor_position,
recent_changes=recent_changes,
retrieval_chunks=[FileChunk("utils.py", "def helper(): ...")],
changes_above_cursor=False,
)
completion = generate(model, tokenizer, prompt, device="cuda")
The model uses <|file_sep|> delimiters and a <|cursor|> marker:
<|file_sep|>{file_path}
{file_contents}
{retrieval_chunks}
{recent_changes_as_diffs}
<|file_sep|>original/{file_path}:{start}:{end}
{code_block_before_last_edit}
<|file_sep|>current/{file_path}:{start}:{end}
{code_block_with_cursor_marker}
<|file_sep|>updated/{file_path}:{start}:{end}
{prefill}
The model completes the updated/ section with the predicted new code block.
original:/updated: format<|cursor|> inserted at cursor positionThe updated/ section is seeded with a prefill to constrain generation:
changes_above_cursor=False): Prefill everything up to the cursor line. The model only generates from the cursor line onward.changes_above_cursor=True): Prefill only the first line + trailing blank lines. Gives the model freedom to rewrite lines between the insertion point and cursor.<|file_sep|>{file_path}:{start_line}:{end_line}
original:
{old_code}
updated:
{new_code}
Fine-tuned from Qwen2.5-Coder-7B on developer editing traces using SFT, then GRPO, then DPO.
| Base model | Qwen2.5-Coder-7B |
| Fine-tuning | SFT → GRPO → DPO |
| Parameters | 7B |
| Precision | bfloat16 |
| Context length | 32,768 tokens |
| Architecture | Qwen2 (28 layers, hidden dim 3584) |
| Stop tokens | <|endoftext|>, <|file_sep|> |
| Max output tokens | 1024 |
| Decoding | Greedy (temperature=0) |