I'm often trying to make text-generation scripts for various purposes (either for game or story related work or for the modification of existing texts in an automated Oulipo-esque way to produce surrealistic results).
Unlike visual or sound that can easily be generated from pure synthesis in a consistent way without having to care much about an arbitrary grammar, if your goal is to make sentences that make sense in a given language at some point you have to define where do the expressions or words come from.
I found some word database on the Internet, but they're often either to human-oriented or not precise enough for what I want.
An ideal database would include many informations about the words, the context in which they can be used and the grammar behind it.
So I've started to make my own XML dictionary. I choose French as it's my main language and have many particularities (i.e. word genders) that makes it tricky to present it in a form of a database.
I made it by crawling websites and various text files and made some scripts to sort loads of information in the best way I could. As I don't own any right from the websites I used I suppose that's not legally correct, but writing a dictionary alone is obviously too much work for one man.
The current version has many defects : it lacks word variations (i.e. conjugated verbs) some words have only one entry while having many meanings or types.
I've had new ideas about how to deal with that for the upcoming versions, using human computing could also be an theoretical option but I doubt much people would participate to it.
I've included the 10Mb dictionnaire.xml itself, and also a text file with some ideas about what the ideal "augmented dictionary" of my dreams would contain.
If you share the same interest, know any other database or think you can help me in any way, feel free to contact me.