The Path to AGI - Pt 3
The Path to AGI - Pt 3
I'm making progress on the synthetic auditory model. I had to make some design changes to the architecture along the way, as one would expect but the core principles remain unchanged and I'm making good progress.
Right now the model is able to distinguish very well between noise and speech, find boundaries in sounds, segment sounds into syllables and find stable sequences with these syllables.
As it stands, I've completed training of 4 heads
- Contrastive Encoder Head - this is the entry point for the audio which outputs frame-level embeddings (T,D) - TRAINED AND FROZEN
- Segmentation Head - Direct fine-grained segment (syllable-like) boundaries. It takes the frame level embeddings from the contrastive encoder and outputs boundary probabilities (T,) - TRAINED AND FROZEN
- Segment Pooling Head - Pools variable length segments into symbolic units - Only used when doing proto-vocab training)
- Symbolic Transformer Proto-Vocab head - Learns stable token IDs for segment sequences, takes pooled segment embeddings. - TRAINING NOW, Never Locked. This will be part of the self play unsupervised learning protocol to improve/learn new syllables /sounds.
- Proto-Word Learner Head - Discovers symbolic units across time (e.g. 12 47 12 = BA NA NA) - TO DO - Never Locked and will also be part of the self play learning protocol.
- Proto-Grammar Head - I've already started on this as you'll not from the previous post, effectively it's almost the same as the Proto-Word learner head, but rather than taking in symbolic syllable units, it will take in proto-word token sequences to learn grammatical structures. Like the Proto-Vocab and Proto-Word heads, this is also a transformer based language model that will not be frozen and will be part of the self play protocol.
- Symbol Merger Head - Compress and merge similar token ID's - Keeps the model fast and helps it learn that "BA" from different voices is "BA" TO DO
- MemoryBank for vocab and grammar rules. TO DO
The hard part is keeping up with what I am doing. I keep losing track of what changes I have made, and where I am at as I get side tracked with ideas and tests along the way. Stopping and putting things in writing here helps me take a pause and assess where I am at with everything. It's also sometimes painfully slow when waiting for training to complete, or battling late night coding mistakes the next morning, but all progress is progress.