predictive Text

Lately I've been discussing the best way for many musicians to play together over the internet.
There are several known solutions to handle the delay due to transmission but no way to completely avoid it as far as I know.
One thing we imagined was to use neural nets to predict the next notes played by each musician and play "forward" in time to compensate the delay and keep everyone together.
That means the predicting systems would have to constantly readjust themselves (at sample rate ?) and the musicians would never "actually" play together.
Since music usually implies repetitive patterns at various scales, it possibly has some chance of success.

As I'm writing this (2020), machine learning used to generate sounds is somehow a common thing, but I don't think I've heard one that behaves the way I'm describing here.
Testing this would take me some work, so first I wanted to try the same thing with text using only my phone predictions.
I started by typing all the words of an existing text (an extract from T. S. Eliot's The Waste Land) and wrote down all of my phone's suggestions to create a new text out of them.

Original :
A woman drew her long black hair out tight
And fiddled whisper music on those strings
And bats with baby faces in the violet light
Whistled, and beat their wings
And crawled head downward down a blackened wall
And upside down in air were towers
Tolling reminiscent bells, that kept the hours
And voices singing out of empty cisterns and exhausted wells.

Generated :
A good is a to enough and in of
And she her her and the who and
And the in the names and the united of
And is the the own
To make with in in the of
With then down the the and the
In and and on the the same
Of the in in loud the rooms and the all

It doesn't work well at all.
Note that this is not the common way to generate a text out of a phone's suggestions where you recurrently use the produced words to get the next one ; here I write down each suggestion but keep correcting the phone all the time.
Since my phone is probably more used to french, here is another example with Jean de la Fontaine's Le corbeau et le renard.
This time I accepted the case when the phone was predicting several words at once.

Original :
Maître Corbeau, sur un arbre perché,
Tenait en son bec un fromage.
Maître Renard, par l'odeur alléché,
Lui tint à peu près ce langage :
Et bonjour, Monsieur du Corbeau.
Que vous êtes joli ! que vous me semblez beau !
Sans mentir, si votre ramage
Se rapporte à votre plumage,
Vous êtes le Phénix des hôtes de ces bois.

Generated :
Maître de de le terrain de la
De à la tête jardin et petit
Il a de de de
De fait la main servi de la que je :
Ne il je le président président.
Et je vous êtes toujours et vous êtes êtes manquez de
Alix doute je ne vous chanson ne
Est trouve à à la chien
Il pouvez partants seul à de la hommes de votre services

The result is still bad, but grammar is a complex and arbitrary thing.
I'm still curious about the effect of this principle on sound.