How 2024 will be multimodal AI’s leap forward; meet OpenAI's VP of global affairs; CES embraces AI; Waymo’s robotaxis are hitting the highway; AI replaces the metaverse as Zuckerberg’s top priority
The existential race to fund AI companies; chatbots might teach your children; the Rabbit R1 is this generation's Tamagotchi; inside Anthropic’s unusual fund raise;
While we’ve spent the first two weeks of 2024 talking about large language models that not only “hallucinate” but also “regurgitate,” I believe we’re going to shift our attention more towards multimodal AI as the technology become more capable and starts appearing in real-world applications. That’s because, in order to unlock the next level of utility, AI will require broader sensory perception - integrating sight and sound - and richer ways to communicate that are closer to the human experience.
Large language models perform well on text, but cannot see, hear, or feel. Their comprehension of our world is limited by their text-only inputs and their capability of expression is further constrained by their outputs. And while image or video generators are a step in the right direction, they are still mostly one-dimensional: you can prompt Midjourney to create amazing images or get ElevenLabs to generate lifelike speech, but their usefulness largely stops there.
Real human communication and collaboration seamlessly interprets images, videos, voices, even gestures in formulating responses or completing tasks. We don't just talk in one-off exchanges - we infer meaning from a flood of sensory signals, decide the best medium of expression on the fly, and iterate and negotiate until we reach our goal.
Multimodal AI aspires to mimic this by combining computer vision, speech recognition, and natural language processing. This allows teaching AI systems via more natural modes of information exchange, instead of solely text data. We're seeing some early glimpses of multimodal AI’s potential with models like Google’s Gemini able to parse verbal instructions aiding visual perception for specified tasks. This bridges better into real-world human needs than text-based chatbots. Integrating additional sensory input modalities will only grow more valuable in applications such as healthcare, education, or finance.
And while language is powerful mechanism to acquire or impart knowledge, it's far from the only (or most efficient) way to learn or communicate. Emotions, humor, and intent are often conveyed through tone, body language, and context. Imagine misinterpreting a sarcastic remark because you only read the words, and not heard the voice tone. Large language models and people can easily stumble in such situations; multimodal AI is better equipped to grasp and express these subtleties, leading to more natural and empathetic interactions between humans and AI, and more productive and efficient exchanges between the digital and the physical worlds.
The future of AI is multimodal - in and out.
And now, here are this week’s news:
❤️Computer loves
Our top news picks for the week - your essential reading from the world of AI
The Last Frontier of Machine Translation [The Atlantic]
Robots Learn, Chatbots Visualize: How 2024 Will Be AI’s ‘Leap Forward’ [New York Times]
Meet the woman who transformed Sam Altman into the avatar of AI [Washington Post]
The Rabbit R1 is an AI-powered gadget that can use your apps for you [The Verge]
Waymo’s Robotaxis Are Hitting The Highway, A First For Self-Driving Cars [Forbes]
AI can transform education for the better [The Economist]
How AI Replaced the Metaverse as Zuckerberg’s Top Priority [Bloomberg]
Will Chatbots Teach Your Children? [New York Times]
Inside AI Unicorn Anthropic’s Unusual $750 Million Fundraise [Forbes]
From voice synthesis to fertility tracking, here are some actually helpful AI products at CES [TechCrunch]
⚙️Computer does
AI in the wild: how artificial intelligence is used across industry, from the internet, social media, and retail to transportation, healthcare, banking, and more
AI Tool Helps Fix Faulty Trades Amid Shift to Faster Settlement Times [Bloomberg]
Now You Can Play ‘Trivial Pursuit’ Online With an Infinite Number of AI-Generated Questions [The Messenger]
Japan's Bellsystem24 to offer AI-supported call centers in Taiwan [Nikkei]
Kids robot Moxie gets AI upgrade and tutoring features [Fast Company]
Valve now allows the “vast majority” of AI-powered games on Steam [Ars Technica]
AI starts to show promise as tool to sift mountain of sustainability research [FT]
AI robot to identify and fill in potholes in Hertfordshire [BBC]
Omnicom is leveraging AI to optimize workflows with a virtual assistant [Business Insider]
Mercedes-Benz’s best-in-class voice assistant is getting an AI boost [The Verge]
Sam's Club will stop checking receipts at the door — and instead use AI to snap photos of your shopping cart [Business Insider]
Not even Notepad is safe from Microsoft’s big AI push in Windows [The Verge]
Amazon’s Alexa gets new generative AI-powered experiences [TechCrunch]
Alibaba’s DingTalk updates workplace collaboration app with new AI agent amid growing client demand [South China Morning Post]
Mayo Clinic pairs with Cerebras Systems to help develop AI for health care [Reuters]
Mobileye Unveils Customizable Operating System for Self-Driving Cars [Bloomberg]
Walmart Expands Rollout of Generative AI Shopping Search, Tech [Bloomberg]
New material found by AI could reduce lithium use in batteries [BBC]
Volkswagen says it’s putting ChatGPT in its cars for ‘enriching conversations’ [The Verge]
How Amazon Fashion is using AI to help you find the perfect fit [Amazon]
AI sheds light on the ancient origins of England's place names [New Scientist]
Samsung is betting your home needs an AI robot with a projector [Washington Post]
Infineon teams up with Aurora Labs on predictive maintenance for driver safety [VentureBeat]
Deloitte rolls out artificial intelligence chatbot to employees [FT]
Oxfordshire based firm Oxa designs the AI behind self-driving cars [BBC]
Meet the eerie AI clones that can replace you on the web [The Times]
🧑🎓Computer learns
Interesting trends and developments from various AI fields, companies and people
OpenAI Signs Up 260 Businesses for Corporate Version of ChatGPT [Bloomberg]
ABB buys tech company to give industrial robots eyes and brains [Reuters]
Microsoft and SAP unveil new AI solutions for retail ahead of NRF 2024 [VentureBeat]
Toyota's Robots Are Learning to Do Housework—By Copying Humans [Wired]
Exclusive: India data centre firm Yotta's Nvidia AI chip orders to reach $1 bln [Reuters]
Generative AI isn’t a home run in the enterprise [TechCrunch]
OpenAI In Talks With CNN, Fox and Time to License Content [Bloomberg]
Google Cloud launches new generative AI tools for retailers [CNBC]
One of the world’s largest AI training datasets is about to get bigger and ‘substantially better’ [VentureBeat]
The Flaw That Could Ruin Generative AI [The Atlantic]
Stagwell's Mark Penn says companies should use AI to connect holistically with consumers [Business Insider]
AMD and Intel bet on AI PCs to challenge Nvidia chip dominance [Nikkei]
OpenAI debuts ChatGPT subscription aimed at small teams [TechCrunch]
Singapore keeping its eye on data centers and data models as AI adoption grows [ZDNet]
Watch Qualcomm's CES 2024 keynote on its highly anticipated AI chip [Engadget]
The Twelve Startups Battling For a Slice of Nvidia’s Pie [The Information]
Automation driving AI adoption, but lack of right skillsets slowing down returns [ZDNet]
AI becoming part of people’s jobs could make salaries rise, Randstad CEO says [CNBC]
Intel acquires car chip firm and plans to bring AI PCs to software-defined cars [VentureBeat]
Chinese companies resort to repurposing Nvidia gaming chips for AI [FT]
DeepMind spin-off aims to halve drug discovery times following Big Pharma deals [FT]
OpenAI debuts GPT Store for users to buy and sell customized chatbots [The Guardian]
4 careers where workers will have to change jobs by 2030 due to AI and shifts in how we shop, a McKinsey study says [Business Insider]
Amazon's biggest generative AI product launch was rushed and flawed, leaving insiders looking for answers [Business Insider]
Intuition Robotics is giving its social bot a generative AI upgrade, and it makes so much sense [ZDNet]
AI is not everything for car chip innovations, NXP tech officer says [Nikkei]
Siemens teams up with Microsoft on cross-industry AI adoption [VentureBeat]
Why Writer’s Palmyra LLM is the little AI model that could for enterprises [VentureBeat]
Microsoft DragNUWA pushes the bar in AI video with trajectory-based generation [VentureBeat]
Microsoft Says AI Service Will Accelerate Scientific Discovery [Bloomberg]
AI isn’t coming for your job, but it’s definitely going to be your new coworker [Fortune]
GitHub execs say coders are rushing to use its AI coding tool despite copyright concerns [Business Insider]
What to expect from the coming year in AI [MIT Technology Review]
It’s No Wonder People Are Getting Emotionally Attached to Chatbots [Wired]
Can artificial intelligence pave the way for greener cement and steel? [Reuters]
OpenAI’s new app store could democratize AI and set up a clash with Apple [Semafor]
Despite free access to GPT-4, Microsoft’s Copilot app hasn’t impacted ChatGPT installs or revenue [TechCrunch]
Getty is adding an AI image generator to iStock with legal protections for users [VentureBeat]
AMD launches next-gen GPUs and AI processors [VentureBeat]
AI-driven drug discovery is poised to boom in 2024 [VentureBeat]
Nvidia showcases automotive partners and generative AI for robotics [VentureBeat]
AI for everything: 10 Breakthrough Technologies 2024 [MIT Technology Review]
The economy is entering a new 'super cycle' driven by AI and decarbonization, Goldman Sachs analyst says [Business Insider]
'Elvis AI show more like time travel than Abba hologram' - creator [BBC]
Keep reading with a 7-day free trial
Subscribe to Computerspeak by Alexandru Voica to keep reading this post and get 7 days of free access to the full post archives.