The French trainingdata are a combination of lexique and of data that also refers to lexique, but stems from limsi: https://perso.limsi.fr/anne/OLDlexique.txt The original lexique download site is no longer accessible. I think these sources contain the same data: https://github.com/WhiteFangs/lexique.sql Licence unknown.