Context Window
About Context Windows:
- In general terms, the amount of conversation which an AI can read and write at any given time is called the context window.
- They are measured in something which is called as tokens.
- During the OpenAI Dev Day in 2023, Sam Altman announced GPT-4 Turbo with a massive context window of 128K tokens, which translates to roughly around 300 pages of a book.
- With regards to Large Language Models, tokens are the basic unit of data processed by these models.
- For example, the maximum number of tokens that a model can consider at once when generating text.
Importance of Context Windows:
- According to the definition by Google Deepmind researchers they are very much crucial as they help AI models recall information during a session.
- It is context windows which help AI models or LLMs capture the contextual nuances of languages and it enables these models to understand and generate human-like responses.
How do context windows work?
- Context windows operate by creating a sliding window over the input text, focussing on one specific word at a time.
- It is important to note that the size of the context window is a key parameter, as based on it, the scope of contextual information assimilated by the AI system is determined.
- Context windows in Large Language Models work like reading a book: a window slides over text, analysing a few words at a time.
- Each word is like a code representing its meaning, and the programme considers words within the window to understand their deep relationships.
Importance of the SIZE:
- It was months after Sam Altman announced a 128K token size for GPT-4 Turbo, Google announced its AI model Gemini 1.5 Pro with a context window of up to nearly 1 million tokens.
- Even though larger windows can mean better performance or accuracy of the AI model but sometimes, the benefits may hit a stagnation point, and too big a window may mean that irrelevant information is included.
- Main benefits of a bigger context window are that they allow models to reference more information, understand the flow of the narrative, maintain coherence in longer passages, and generate contextually enriched responses.
- But, on the other hand, the most apparent disadvantage of a large window is that the requirement of massive computational power during training and inference times.
- Escalating hardware requirements and costs is also a one of the issue.
- With large context windows, AI models may even end up repeating or contradicting themselves.
- Apart from that, greater computational power spells an increased carbon footprint, which is a looming concern in sustainable AI development.
- Besides, training models with large context windows would also translate to significant usage of memory bandwidth and also huge storage.
- This would mean that only large corporations would be able to invest in the costly infrastructure which is required.