--- license: cc-by-nc-4.0 language: - en base_model: Qwen/Qwen3-8B tags: - bargaining - negotiation - reinforcement-learning - grpo - bilateral-bargaining --- # Qwen3-8B Bargaining Seller (RL) A Qwen3-8B model trained via **reinforcement learning (GRPO)** to play as the **seller** in bilateral bargaining negotiations. ## Overview This model was trained as part of the [LLM Bilateral Bargaining](https://github.com/ZhuoranYang/llm_bilateral_bargaining) project, which studies how LLM agents negotiate in structured buyer-seller bargaining games. **Training method**: Group Relative Policy Optimization (GRPO) with a multi-component reward function covering parsing correctness, execution success, constraint compliance, and negotiation utility. Initialized from the [SFT checkpoint](https://huggingface.co/yale-cadmy/qwen3-8B-bargaining-sft). **Role**: Seller agent — negotiates to sell items at the highest price while respecting a private minimum acceptable price. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "yale-cadmy/qwen3-8B-bargaining-seller-rl", torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("yale-cadmy/qwen3-8B-bargaining-seller-rl") ``` ## License CC-BY-NC-4.0. See the [LLM Bilateral Bargaining](https://github.com/ZhuoranYang/llm_bilateral_bargaining) repository for details.