So, as the project is winding down and I have no more time to run experiments (also, I have to write a 10-page paper on Thompson sampling for tomorrow), I thought I would write a brief post re-capping what I did and learned from this project.
The common architecture throughout my project is the variational autoencoder. While the results from the rest of the class using LSTMs was markedly better, I decided to explore the VAE in a bit more depth rather than trying to replicate the results. I guess this is because I find playing with these latent variable models more interesting in a sense, rather than tuning the hyperparameters of an LSTM. I suspect that this is because the nature of this project (as opposed to the Cats & Dogs project) was more qualitative – I was content with trying to ‘make interesting things happen’ rather than ‘get the best score’, because nobody was posting their negative log likelihood scores (indeed, when I tried to submit them it didn’t work).
In the end, while the audio I generated wasn’t terrible, it was still quite far off from many others using large LSTMs. I suspect that if I had increased the capacity of my decoder significantly by adding more layers, the generated audio would have been quite a bit better. Now that I think about it, I think what happened was that the model was resorting to some kind of averaging in order to minimize the L2 reconstruction error – this points to another solution, which would be to have a better measure of loss (maybe some sort of adversarial training?). Also, this kind of averaging is somewhat similar to what we see in the dialogue problem, where our decoders generate very generic sentences in order to minimize the NLL. Maybe adversarial training could help there as well…
Looking back on my samples, it seems like the audio I generated using iterative refinement and a 50-dimensional Z (https://drive.google.com/open?id=0B-OCmk1sbIXRWXFacW1RN3Bzbkk) gave the best results, even though there was quite a bit of noise. I had forgotten this when I ran my latent sampling models, thus I had reduced the size of my Z to be 10 for convenience. I would have like to run a couple more experiments, using a 100-dimensional Z and a sampling rate of 4000, but I will have to do that another day.
I’m happy that the project gave me an opportunity to play around with the VAE, something I’ve been wanting to for a while. A few of the lessons I learned from the project:
- If your learning curves are fishy, always suspect your learning rate!
- Tuning hyperparameters is pretty important. No surprises there.
- It is always useful to come up with toy examples to debug your model, before you actually train and test on a dataset. I suspect that if I had fed a simple sine wave into my model before I figured out I was binarizing everything, I would have figured out my problems much sooner.
- If you are under a tight deadline and need the best results, it’s easier to build off of the results of others, rather than writing everything from scratch. That’s why we have open source code!
Thanks for reading!