Updated 8/05/24:
Have you ever chatted with a bot and had no idea it wasn’t a human on the other end? That’s the power of conversational AI. It’s come a long way from the early days of chatbots that could barely string a sentence together. Now, with LLMs (Large Language Models) like ChatGPT, we can have natural, human-like conversations with machines. But where is conversational AI headed next, and what role will natural language processing (NLP) play in shaping the future of user interfaces? Can it become, as some have suggested, a universal interface?
A UI or User Interface is all the controls a person uses to manipulate or control a device. It’s how you make things work, whether it’s a keyboard, a TV remote control, a car’s steering wheel and pedals, a touchscreen, or the microphone you speak to your voice assistant with and the software it uses to understand you. There are a lot of related acronyms:
- TUI – Text User Interface or Terminal User Interface e.g. DOS, Unix
- GUI – Graphical User Interface (pronounced “gooey”) e.g. Windows, grocery store self-checkout, smartphone
- VUI – Contested. Sometimes Video User Interface, sometimes Voice User Interface
- SUI – Speech User Interface e.g. a microphone, an intercom, a synthesizer
- UX – User Experience. Does the interface help the user do what it’s meant to do in a simple and efficient manner? Is it nice to work with? Fun to use?
Interface, Schminterface. Who Cares?
User interfaces are important! If you think about how we interact with our technology, we’ve come a long way. Early electronics were all hard physical buttons, knobs, dials, sliders. The physical size of the objects had a lot to do with how we designed interfaces. Think about the handsets of older phones and office phones, that banana-dumbbell shape, great to let your hand grip between the speaker near your ear and the microphone near your mouth.
As technology started getting smaller, so did the buttons and knobs. Eventually we started replacing many physical controls with software controls that could be interacted with through keyboards and later mice — the buttons had become separate devices themselves.
Microphones got smaller, speakers got smaller, screens became flat liquid crystals and then LEDs instead of giant cathode ray tubes. The buttons moved back off physical devices onto the screens themselves as touchscreens took over. Now we can use our voices, our gestures, and even our thoughts.
Voice and AI Voice Assistants
A lot of growth in alternate UIs over the last several years was in home assistants. Think Siri, Alexa, or Google Assistant. These AI-powered assistants can help you with everything from setting reminders to playing music to answering trivia questions. Basically, searching the web and media functions. As these virtual assistants become more sophisticated, they may be able to handle even more complex tasks and become more integrated into our daily lives.
Voice is not going to be the universal user interface though. While voice control offers a cool “I tell my phone what I want it to do” vibe, it has some real drawbacks too. For instance, in crowded spaces, speaking to a device can be really awkward sometimes. For the speaker or the listener. Who hasn’t wished they could hear less of the personal details of that person obliviously talking into their headset on the bus, on the subway, waiting in line somewhere? “Hey Google, what are these warts on my…” AHEM. Anyway.
You’re on a train, it’s about to crash, you yell “COMPUTER: EMERGENCY, STOP THE TRAIN!” It didn’t hear you though. Hard to make out over all those terrified screaming passengers.
And there are security risks; user inputs can be easily overheard and recorded by those nearby, which isn’t ideal when dealing with sensitive information. “Enter my PIN number, it’s…”
And let’s face it, for critical tasks an interface that can be rendered ineffective by loud background noises could be downright dangerous. Imagine this nightmare. You’re on a train, it’s about to crash, you yell “COMPUTER: EMERGENCY, STOP THE TRAIN!” It didn’t hear you though. Hard to make out over all those terrified screaming passengers. Could you please use the keyboard?
ChatGPT and the LLMs
So where do GPT and other LLMs fit into all of this? As the most advanced conversational AIs out there, they’re likely to play a big role in the future of interfaces. Their ability to generate natural, human-like responses means that they could be used for nearly any interaction.
ChatGPT’s ability to perform as a conversational AI almost immediately exposed how tame and limited the existing virtual assistants are in their ability to understand. OpenAI first released a truly conversational mode for GPT in their mobile apps, where users can press a button to have a completely hands-free, relatively natural conversation with GPT as they go about their business. It’s pretty darn cool. But ChatGPT still can’t actually DO many things.
AI Agents
Enter the “Large Action Models.” Agentic models built into physical devices that focus on using APIs to perform actions for the user: book tickets, send emails, find things, order things. Anything you can do to affect the physical world through an API should be possible.
Should be. Recent attempts at multi-modal AI interfaces, like the Humane AI Pin and Rabbit R1 AI device, have flopped both financially and with customers. Marketed as the next big step in wearable technology, these devices promised to provide a seamless user experience that combined voice interaction with real-world usefulness. Yet, they have largely failed in delivering on their hype. Users have reported underwhelming performance and the no smooth, intuitive feel to using the devices as was promised. Multi-modal AI user interfaces have a lot of promise but evidently the execution still has a long way to go.
Where I Once Was Blind, I Now Can See
Interestingly, AI is making strides in some areas beyond traditional interfaces, particularly in providing “sight” to the blind. Using image-to-text models, technologies are being developed that can describe a blind user’s surroundings in real-time. This application demonstrates how AI can break down barriers and enhance accessibility, showcasing the potential for NLP and LLMs to serve as a universal interface for various user needs.
While the future of conversational AI as Universal Interface is filled with promise, the transition from graphical interfaces to conversational ones is far from over and has limitations that mean it will never be 100%.