Protein ML Colab Notebooks
Seven Google Colab notebooks made for the CSBERG Synthetic Biology course. Content delivered in Summer 2021.
The table of contents Colab notebook is here.
1. Introduction
- Basic
numpy
andpytorch
vectorized operations .backward()
,.grad
, manual gradient optimization- Model saving and loading
- Curse of dimensionality exercise
- Loading
.csv
and.fasta
files of sequences, one-hot encoding - PyTorch
Dataset
andDataLoader
2. Discriminative Models
- Two layer fully-connected neural network for catalytic activity prediction
- Rough Mount Fuji model
3. Generative Models
- Representing multiple sequence alignments as matrices
- Variational Auto-Encoders trained on Pfam aligned sequences
- Sampling sequences from VAEs and visualizing results with sequence logos
4. Model-based Optimization
- Latent space optimization
- Conditioning by Adaptive Sampling (CbAS)
5. Inductive Bias
- Potts model implementation in PyTorch
- Attention (WIP) and
nn.Embedding
6. Language Models
bio_embeddings
- Exploratory code to benchmark random embeddings for protein property prediction
7. Structure-based Models
py3Dmol
for visualizing structures in Colab- Distance matrix, orientograms from
trRosetta
- Molecular dynamics with
OpenMM
Slides
The accompanying slides to notebooks 1, 2 and 3. Slides 1-28 can be delivered in about 2 hours.