G-RIPS Research Musing #4

4 minute read

Published:

Stalled on Gaussian process techniques this week. Ultimately hard to find out what precisely the GP-regression could achieve – if we already have an aligned shape we can of course calculate its deformation field, and then we can transform the reference shape into the target shape. That might suggest that GP-regression should be able to go from an arbitrary shape to an aligned shape, which may indeed be the case. I do have some confusion about what the actual train and test of a GP-regression should be – ultimately I suppose it will have to output some coefficients for creating a weighted linear combination of the eigenbasis which is derived from the eigenfunctions of the GP kernel.

Otherwise we have made a lot of progress on the neural network approaches. Greg, the 5th year from UC Davis, has very impressively reparametrized the FlowSSM model to begin with a voxelized representation and then use IMNet style architecture to compute a probability that a given point belongs to the shape. We think that we may be able to proceed further and deform a mean shape to a target shape by giving FlowSSM the voxelized latent representation of the target and sampling points in the mean shape.

My contributions so far to this effort have been minimal – I have been reading Ian Goodfellow’s book on learning, which has very good discussions of network architecture. Montufar 2014 is cited frequently in the discussion of the ability for deep networks to outperform shallow ones. I also really like the example of calculating XOR using a perceptron: ReLU plus the WX + Beta exactly calculates XOR, which Greg went on to extrapolate to width as function of the number of if-statements and depth as a function of the number of weight multiplications you have to do. Also useful to learn the motivation for convolutional neural networks as a sort of “standard” architecture – the gains to computational efficiency are so high that the slightly decreased probability of non-fully-connectedness is overwhelmed. I also spent a good deal of this week simply getting Keras and TensorFlow up and running on my laptop – we were slightly worried beforehand that I had irreversibly screwed my Python installation, which may yet turn out to be the case but as of now seems to be averted. At least I know going forward that Conda + new environments for everything is the way to go.

The ImNet architecture which FlowSSM and thus our new model is based on uses a variational autoencoder. Autoencoders, per Goodfellow, are networks with an architecture in which we first compute a latent representation of the input h = f(x) and then decode to get a reconstruction of the input r = g(h). For autoencoders, the output is the input. Autoencoders in which the dimension of h is less than the dimension of x are undercomplete. If the decoder is linear and we minimize MSE, autoencoders essentially just perform PCA of the input. If the encoder and decoder are nonlinear, then we get PCA plus mysterious extra benefits. If the dimension of the auto encoder is poorly chosen, however, it will simply learn to copy the input without doing anything interesting – say h were 1-dimensional, then the VAE would only learn basically the input’s index in the training set. To fix this, we regularize the VAE by enforcing properties like sparsity in the representation, smallness in the gradient, or robustness to noise.

Feeling as I do more grounded in the basics of scientific computing with neural networks, my task next week will be to investigate the latent space and the voxel representations. More specifically, I will (1) investigate the impact of changes to the latent dimension from 128 to 256, (2) explore the pointwise correlations in the voxel space and (3) investigate the relationship between the same shape in the ambient voxel space and the latent space.