I want to do SFT and DPO at the same time.
What I mean is I want to SFT model a few steps, and then I use that same model to do the DPO.
Or create a Datacollator of about 15 samples to perform SFT. After performing SFT of all 15 samples, I made another Datacollator of about 5 samples to perform DPO, repeating the above process until I had finished running all the data sets.
Does anyone have any ideas that they can suggest to me?