In the past few months, advances in large language models (LLM) have shown what could be the next big computing paradigm. ChatGPT, the latest LLM from OpenAI, has taken the world by storm, reaching 100 million users in a record time.
Developers, web designers, writers, and people of all kinds of professions are using ChatGPT to generate human-readable text that previously required intense human labor. And now, Microsoft, OpenAI’s main backer, is trialing a version of its Bing search engine that is enhanced by ChatGPT, posing the first real threat to Google’s $283-billion monopoly in the online search market.
Other tech giants are not far behind. Google is taking hasty measures to release Bard, its rival to ChatGPT. Amazon and Meta are running their own experiments with LLMs. And a host of tech startups are using new business models with LLM-powered products.
We’re at a critical juncture in the history of computing, which some experts compare to the huge shifts caused by the internet and mobile. Soon, conversational interfaces will become the norm in every application, and users will become comfortable with—and in fact, expect—conversational agents in websites, mobile apps, kiosks, wearables, etc.
The limits of current AI systems
As much as conversational UX is attractive, it is not as simple as adding an LLM API on top of your application. We’ve seen this in the limited success of the first generation of voice assistants such as Siri and Alexa, which tried to build one solution for all needs.
Just like human-human conversations, the space of possible actions in conversational interfaces is unlimited, which opens room for mistakes. Application developers and product managers need to build trust with their users by making sure that they minimize room for mistakes and exert control over the responses the AI gives to users.
We’re also seeing how uncontrolled use of conversational AI can damage the user’s experience and the developer’s reputation as LLM products are going through their growing pains. In Google’s Bard demo, the AI produced untruthful facts about the James Webb telescope. Microsoft’s ChatGPT-powered Bing has been caught making egregious mistakes. A reputable news website had to retract and correct several articles that were written by an LLM after they were found to be factually wrong. And numerous similar cases are being discussed on social media and tech blogs every day.
The limits of current LLMs can be boiled down to the following:
- They “hallucinate” and can state wrongful facts with high confidence
- They become inconsistent in long conversations
- They are hard to integrate with existing applications and only take a textual input prompt as context
- Their knowledge is limited to their training data and updating them is slow and expensive
- They can’t interact with external data sources
- They don’t have analytics tools to measure and enhance user experience
Multimodal conversational UX
We believe that multimodal conversational AI is the way to overcome these limits and bring trust and control to everyday applications. As the name implies, multi-modal conversational AI brings together voice, text, and touch-type interactions with several sources of information, including knowledge bases, GUI interactions, user context, and company business rules and workflows.
This multi-modal approach makes sure the AI system has a more complete user context and can make more precise and explainable decisions.
Users can trust the AI because they can see exactly how and why the AI decided and what data points were involved in the decision-making. For example, in a healthcare application, users can make sure the AI is making inferences based on their health data and not just on its own training corpus. In aviation maintenance and repair, technicians using multi-modal conversational AI can trace back suggestions and results to specific parts, workflows, and maintenance rules.
Developers can control the AI and make sure the underlying LLM (or other machine learning models) remains reliable and factful by integrating the enterprise knowledge corpus and data records into the training and inference processes. The AI can be integrated into the broader business rules to make sure it remains within the boundaries of decision constraints.
Multi-modality means that the AI will surface information to the user not only through text and voice but also through other means such as visual cues.
The most advanced multimodal conversational AI platform
Alan AI was developed from the ground up with the vision of serving the enterprise sector. We have designed our platform to use LLMs as well as other necessary components to serve applications in all kinds of domains, including industrial, healthcare, transportation, and more. Today, thousands of developers are using the Alan AI Platform to create conversational user experiences ranging from customer support to smart assistants on field operations in oil & gas, aviation maintenance, etc.
Alan AI is platform agnostic and supports deep integration with your application on different operating systems. It can be incorporated into your application’s interface and tie in your business logic and workflows.
Alan AI Platform provides rich analytics tools that can help you better understand the user experience and discover new ways to improve your application and create value for your users. Along with the easy-to-integrate SDK, Alan AI Platform makes sure that you can iterate much faster than the traditional application lifecycle.
As an added advantage, the Alan AI Platform has been designed with enterprise technical and security needs in mind. You have full control of your hosting environment and generated responses to build trust with your users.
Multimodal conversational UX will break the limits of existing paradigms and is the future of mobile, web, kiosks, etc. We want to make sure developers have a robust AI platform to provide this experience to their users with accuracy, trust, and control of the UX.