---
license: cc-by-nc-4.0
language:
- en
base_model: Qwen/Qwen3-8B
tags:
- bargaining
- negotiation
- reinforcement-learning
- grpo
- bilateral-bargaining
---

# Qwen3-8B Bargaining Seller (RL)

A Qwen3-8B model trained via **reinforcement learning (GRPO)** to play as the **seller** in bilateral bargaining negotiations.

## Overview

This model was trained as part of the [LLM Bilateral Bargaining](https://github.com/ZhuoranYang/llm_bilateral_bargaining) project, which studies how LLM agents negotiate in structured buyer-seller bargaining games.

**Training method**: Group Relative Policy Optimization (GRPO) with a multi-component reward function covering parsing correctness, execution success, constraint compliance, and negotiation utility. Initialized from the [SFT checkpoint](https://huggingface.co/yale-cadmy/qwen3-8B-bargaining-sft).

**Role**: Seller agent — negotiates to sell items at the highest price while respecting a private minimum acceptable price.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "yale-cadmy/qwen3-8B-bargaining-seller-rl",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("yale-cadmy/qwen3-8B-bargaining-seller-rl")
```

## License

CC-BY-NC-4.0. See the [LLM Bilateral Bargaining](https://github.com/ZhuoranYang/llm_bilateral_bargaining) repository for details.