Photo: Google
Google I/O, the tech giant’s highly anticipated annual developer conference, concluded recently with numerous announcements, placing artificial intelligence (AI) at the forefront. The event drew the world’s attention towards the Gemini AI model and its integration into various applications, including Workspace and Chrome.
Here’s a look at all the announcements made by Google at the event:
Google Lens gets video search
One of the announcements was Google’s expansion of Lens capabilities to include video search. While Lens was already renowned for its image-based search functionalities, the company added video search to enhance user experience further.
Users can now capture videos of objects or scenes and ask questions in real-time, leveraging AI to retrieve relevant information from the web.
Gemini integration into Google Photos
Google also unveiled several enhancements to its Gemini AI models, catering to diverse user needs. ‘Ask Photos,’ a new feature slated for a summer release, empowers users to query their extensive Google Photos library with natural language commands.
” With Ask Photos, you can ask for what you’re looking for naturally, like: “Show me the best photo from each national park I’ve visited.” Google Photos can show you what you need, saving you from all that scrolling,” said Google.
This feature extends beyond simple image retrieval, showcasing Gemini’s ability to understand context and provide meaningful responses.
Gemini 1.5 Flash
A highlight of Google I/O was the introduction of Gemini 1.5 Flash, an AI model optimised for high-frequency, low-latency tasks. This new addition to the Gemini lineup promises faster responses and improved translation, reasoning, and coding.
Google also announced doubling Gemini 1.5 Pro’s context window from 1 million to 2 million, enabling the model to process more information for improved performance.
Gemini 1.5 Pro integration across Workspace
Google announced Gemini integration across Workspace applications. This integration brings a versatile AI assistant to users’ fingertips, offering contextual insights, generating personalised emails, and facilitating seamless collaboration across Docs, Sheets, Slides, Drive, and Gmail.
The company announced that this integration is still in the testing phase and plans to roll out to all paid Gemini subscribers in the coming months.
Project Astra
Google showcased Astra’s capabilities in understanding visual inputs, organising personal data, and executing tasks autonomously.
The company aims to develop this AI agent as a challenger to GPT-4o. However, more information on this project remains scarce, and the company is still silent.
In the demo, the company showcased the AI identifying a sound but without providing more details.
Veo: Text-to-video platform
Google also introduced Veo, a generative AI model that transforms textual, image, and video prompts into high-quality videos. Creators can leverage Veo’s capabilities to produce content across various styles and genres.
“Veo generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute. With an advanced understanding of natural language and visual semantics, it generates video that closely represents a user’s creative vision,” explained the company.
OpenAI also released Sora, a text-to-video AI tool, and Veo, which is Google’s answer to this. It will be interesting to see how the two models fare against each other.
Gems: Personalised chatbot for Gemini users
Gems, a custom chatbot creator, offers Gemin users the ability to tailor AI interactions to their specific needs. Like OpenAI’s GPT models, this feature enables users to create personalised chatbots with specialised functionalities, enhancing the conversational AI experience.
“Whether you need a yoga bestie or calculus tutor, in the coming months, you’ll be able to customise Gemini, saving time when you have specific ways you interact with Gemini again and again,” Google tweeted.
Gemini Live: Human-like conversations with AI
Gemini Live introduces natural voice interactions and real-time camera feedback, making conversations with AI assistants more engaging and dynamic. Gemini will also see more integrations with Google Calendar, Tasks, and Keep to monitor users’ big data activities.
“And in the coming months, we’re rolling out Live for Gemini Advanced subscribers, a new mobile conversational experience that uses our state-of-the-art speech technology to make speaking with Gemini more intuitive. With Gemini Live, you can talk to Gemini and choose from a variety of natural-sounding voices it can respond with,” said Google.
Additionally, Google announced multimodal updates for Gemini on Android, enabling users to ask questions about on-screen videos and ingest PDFs for advanced information retrieval.
AI Overviews: AI-generated search results
Google rolled out ‘AI Overviews,’ formerly known as Search Generative Experience, to everyone in the United States.
With AI Overviews, a specialised Gemini model, which is Google’s AI technology, will curate and display summarised answers from the web. This tool will allow users to ask complicated questions and plan ahead.
“With AI Overviews, people are visiting a greater diversity of websites for help with more complex questions. And we see that the links included in AI Overviews get more clicks than if the page had appeared as a traditional web listing for that query,” explained Google. “As we expand this experience, we’ll continue to focus on sending valuable traffic to publishers and creators. As always, ads will continue to appear in dedicated slots throughout the page, with clear labelling to distinguish between organic and sponsored results.”
This is similar to how other AI search tools like Perplexity or Arc Search work, where complex information is distilled into easily understandable summaries directly on the search results page.
AI-powered scam detection
Android users can look forward to Gemini Nano AI-powered scam detection and enhanced AI capabilities, improving user experience and security. Now, the smartphone will deliver a pop-up and a real-time warning if the user is engaging in a scam conversation.
The tool will flag the conversation if the person asks for personal details such as passwords, PINs, gift cards, or money transfers.
Chrome AI assistant
Google is also adding Gemini Nano to Chrome on desktops with the Chrome 126 update. This assistant can help users in several ways, including generating scripts for social media posts and product reviews and running on-device tasks.
Upgraded SynthID AI watermarking
Google has also upgraded its SynthID to watermark the videos created via the Veo text-to-video generator tool.
“SynthID isn’t a silver bullet for identifying AI-generated content, but is an important building block for developing more reliable AI identification tools and can help millions of people make informed decisions about how they interact with AI-generated content,” said Google.
Also, Google will later open-source SynthID for text watermarking to help developers include this technology in their models.
In the News: Google and Apple collaborate to combat unwanted location tracking