I recently watched the first lecture of Stanford University CS25 V4 Transformers course, and presented by Div Garg, Steven Feng, Emily Bunnapradist and Seonghee L. It was packed with insightful information, and I took some interesting notes.
One of the challenges in NLP is precision. There are issues with context length, and some models lack reasoning and have short context lengths.An interesting fact I learned is that the first chatbot, Eliza Chatbot, was created in 1966.
Semantic parsing
It converts the words in a text into a machine-understandable format. I didn’t delve into the details, but I plan to explore it later as it seems to be a significant topic.
The lecture covered various topics like transformers, RNNs, embeddings, and more, including the architecture of the transformer.
Cross-attention
In a translation example from English to French. We have attention coming from the encoder (we have English input as text and we encode it), and this encoded input will be computed with the output (the French sentence) to check the relation between the initial input and the translated version of it. More technically, the key and value from the encoder will be provided to the decoder.This is called cross-attention. The second source is the self-attention between the output values.
Emergent abilities
Abilities that are present in larger models but not in smaller models.
AGI
Reducing computation complexity, thinking like humans, multi-modality, emotional intelligence, ethical reasoning, and more are necessary to achieve AGI.
Transformers applications
Text, speech, vision (analyzing images like vision transformers), generating images (DalL-E), videos (Sora), robotics (Voyager), and playing games (AlphaGo).
Some of the challenges that LLM face
1)Continual learning: Infinite and permanent self-improvement. How can LLMs learn similarly to us? We don’t spend every two months reading all text online to update our knowledge, so how can LLMs do the same as humans?
2)Model editing: Can we edit specific nodes in the model without the need to update all the weights? I found this idea interesting and have linked a related paper in the comments.
3)Chain of thought: CoT works effectively for models of approximately 100B parameters or more. It seems small models forget some steps during reasoning or do not understand what is required to do.
4)Socratic Questioning: Taken from their slides -> “Divide-and-conquer fashion algorithm that simulates the self-questioning and recursive thinking process. Self-questioning module using a large-scale LM to propose subproblems related to the original problem as intermediate steps and recursively backtracks and answers the sub-problems to reach the original problem.”
They also gave information about agents, multi-agents like agent-computer interaction: using API, or direct interaction using browser or desktop control. This is more interesting.
If you like what you see, hit the follow button! You can also find me on LinkedIn, and we can follow each other there too. 😊