A voice assistant is a digital assistant that uses voice recognition, language processing algorithms, and voice synthesis to listen to specific voice commands and return relevant information or perform specific functions as requested by the user.
Based on specific commands, sometimes called intents, spoken by the user, voice assistants can return relevant information by listening for specific keywords and filtering out the ambient noise.
While voice assistants can be completely software based and able to integrate into most devices, some assistants are designed specifically for single device applications, such as the Amazon Alexa Wall Clock.
Today, voice assistants are integrated into many of the devices we use on a daily basis, such as cell phones, computers, and smart speakers. Because of their wide array of integrations, There are several voice assistants who offer a very specific feature set, while some choose to be open ended to help with almost any situation at hand.
History of voice assistants
Voice assistants have a very long history that actually goes back over 100 years, which might seem surprising as apps such as Siri have only been released within the past ten years.
The very first voice activated product was released in 1922 as Radio Rex. This toy was very simple, wherein a toy dog would stay inside a dog house until the user exclaimed its name, “Rex” at which point it would jump out of the house. This was all done by an electromagnet tuned to the frequency similar to the vowel found in the word Rex, and predated modern computers by over 20 years.
At the 1952 World’s fair, Audrey was announced by Bell Labs. The Automatic Digit Recognizer was not a small simple device however, its casing stood six feet tall just to house all the materials required to recognize ten numbers!
IBM began their long history of voice assistants in 1962 at the World’s Fair in Seattle when IBM Shoebox was announced. This device was able to recognize digits 0-9 and six simple commands such as, “plus, minus” so the device could be used as a simple calculator. Its name referred to its size, similar to the average shoebox, and contained a microphone connected to three audio filters to match the electric frequencies of what was being said and matched it with already assigned values for each digit.
Darpa then funded five years of speech recognition R&D in 1971, known as the Speech Understanding Research (SUR) Program. One of the biggest innovations to come out if this was Carnegie Mellon’s Harpy, which was capable of understanding over 1,000 words.
The next decade led to amazing progress and research in the speech recognition field, leading most voice recognition devices from understanding a few hundred words to understanding thousands, and slowly making their way into consumers homes.
Then, in 1990, Dragon Dictate was introduced to consumers homes for the shocking price of $9,000! This was the first consumer oriented speech recognition program designed for home PC’s. The user could dictate to the computer one word at a time, pausing in between each word waiting for the computer to process before they could move on. Seven years later, Dragon NaturallySpeaking was released and it brought more natural conversation, able to understand continuous speech at a maximum of 100 words per minute and a much lower price tag of $695.
In 1994, Simon by IBM was the first smart voice assistant. Simon was a PDA, and really, the first smartphone in history, considering it predates HTC’s Droid by practically 25 years!
In 2008, when Android was first released, Google had slowly started rolling out voice search for its Google mobile apps on various platforms, with a dedicated Google Voice Search Application being released in 2011. This led to more and more advanced features, eventually leading to Google now and Google Voice Assistant.
Then, this was followed by Siri in 2010. Developed by SRI International with speech recognition provided by Nuance Communications, the original app was released in 2010 on the iOS App Store and was acquired two months later by Apple. Then, with the release of the iPhone 4s, Siri was officially released as an integrated voice assistant within iOS. Since then, Siri has made its way to every Apple device available and has linked all the devices together in a single ecosystem.
Shortly after Siri was first developed, IBM Watson is announced publicly in 2011. Watson was named after the founder of IBM, and was originally conceived in 2006 to beat humans at a game of Jeopardy. Now, Watson is one of the most intelligent, naturally speaking computer systems available.
Amazon Alexa is then announced in 2015. It’s name being inspired by the Library of Alexandria and also the hard consonant “X” in the name, helping with more accurate voice recognition. With Alexa, the Echo line of smart devices are announced to bring smart integration to consumers homes for an inexpensive route.
Alan is finally publicly announced in 2017 to take the Enterprise Application world by storm. Being first born as “Synqq”, Alan is created by the minds behind “Qik”, the very first video messaging and conferencing mobile app. Alan is the first voice AI platform aimed at enterprise applications, so while it can be found in many consumer applications, it is designed for enterprises to be able to develop and integrate quickly and efficiently!
At the bottom of the post we’ve included a Timeline to summarize the history of voice assistants!
Technology behind Voice Assistants
Voice assistants use Artificial Intelligence and Voice recognition to accurately and efficiently deliver the result that the user is looking for. While it may seem simple to ask a computer to set a timer, the technology behind it is fascinating.
Voice recognition works by taking an analog signal from a users voice and turning it into a digital signal. After doing this, the computer takes the digital signal and attempts to match it up to words and phrases to recognize the users intent. To do this, the computer requires a database of pre-existing words and syllables in a given language to be able to closely match the digital signal with. Checking the input signal with this database is known as pattern recognition, and is the primary force behind voice recognition.
Artificial intelligence is using machines to simulate and replicate human intelligence.
In 1950, Alan Turing (The namesake of our company) published his paper “Computing Machinery and Intelligence” that first asked the question, can machines think? Alan Turing then went on to develop the Turing Test, a method of evaluating a computer to test its capability of thinking like a human. There were four approaches later developed that defined AI, Thinking humanly/rationally, and acting humanly/rationally. While the first two deal with reasoning, the second two deal with actual behavior. Modern AI is typically seen as a computer system designed to accomplish tasks that typically require human interaction. These systems can improve upon themselves using a process known as machine learning.
Machine learning refers to the subset of Artificial Intelligence where programs are created without the use of human coders manually creating the program. Instead of writing out the complete program on their own, programmers gives the AI “patterns” to recognize and learn from and then gives the AI large amounts of data to sift through and study. So instead of having specific rules to abide by, the AI searches for patterns within this data and uses it to improve its already existing functions. One way machine learning can be helpful for Voice AI, is by feeding the algorithm hours of speech from various accents and dialects.
While traditional programs requires an input and rules to develop an output, machine learning tools are given an input and an output and use that to create the program itself. There are two approaches to machine learning, supervised learning and unsupervised learning. In supervised learning, the model is given data that is already partly labeled, this means some of the data given will be already tagged with the correct answer. This helps guide the model into categorizing the rest of the data and developing a correct algorithm. In unsupervised learning, none of the data is labeled, so it is up to the model to find the pattern correctly. One of the reasons this is very useful is because it allows the model to find patterns that the creators might have never found on their own, but the data is much more unpredictable.
Different Voice Assistant approaches
Many conversational assistants today combine both a task-oriented and knowledge-oriented workflow to carry out almost any task that a user can throw at it. A task-oriented workflow might include filling out a form, while a knowledge-oriented workflow includes answering what the capital of a state might be or specifying the technical specifications of a product.
A task-oriented approach is using goals to tasks to achieve what the user needs. This approach often integrates itself with other apps to help complete tasks. For example, if you were to ask a voice assistant to set an alarm for 3PM, it would understand this to be a task request and communicate with your default Clock application to open and set an alarm for 3PM. It would then communicate with the app to see if anything else was necessary, such as a name for the alarm, then it would communicate this need back to you. This approach does not require an extensive online database, as it is mainly using the knowledge and already existing skills of other installed applications.
A knowledge-oriented approach is the use of analytical data to help users with their tasks. This approach focuses on using online databases and already recorded knowledge to help complete tasks. An example of this approach is anytime a user asks for an internet search, it will use the online databases available to return relevant results and recommend the highest search result. If someone is searching up a trivia question, this would use a knowledge-oriented approach as it is searching for data instead of working with other apps to complete tasks.
Benefits of Voice Assistants
Some examples of what a Voice Assistant can do include:
- Check the weather
- Turn on/off connected smart devices
- Search databases
One of the main reasons of the growing popularity of Voice User Interfaces (VUI) is due to the growing complexity within mobile software without an increase in screen size, leading to a huge disadvantage by using a GUI (Graphical User Interface). As more iterations of phones come out, the screen sizes stay relatively the same, leading for very cramped interfaces and creating frustrating user experiences, which is why more and more developers are switching to Voice User Interfaces.
Efficiency and Safety
While typing has become much faster as people have gotten used to using standard keyboards, using your voice will always be quicker, much more natural, and lead to less spelling errors. This leads to a much more efficient and natural intelligent workflow.
Quick learning curve
One of the greatest benefits of voice assistants is a quick learning curve. Instead of having to learn how to use devices like mice and touch screens and get used to using specific physical devices, you can just use your natural conversation tendencies and use your voice.
Wider Device Integration
Since a screen or keyboard isn’t necessary, it’s easy to place voice integration into a much wider array of devices. In the future, smart glasses, furniture, appliances, will all come with voice assistants already integrated into the device.
Why and When to use Voice Assistants
There are many use cases for using a voice assistant in todays’ world. For example, when your hands are full and you are unable to use a touch screen or keyboard, or when you are driving Let’s say you are driving and you need to change your music, you could just ask a voice assistant, “play my driving playlist”. This leads to a safer driving experience, and helps avoid the risk of distracted driving.
To further understand voice assistants, it is important to take a look at the overall user Experience and what a User Interface is and how a VUI differs from a more traditional graphical user Interface that modern apps currently use.
Graphical User Interface (GUI)
A Graphical User Interface is what is most commonly used today. For example, the internet browser you’re using to read this article is a graphical user interface. Using graphical icons and visual indicators, the user is able to interact with machines quicker and easier than before.
A Graphical User Interface can be used in something like a chatbot, where the user communicates with the device over text, and the machine responds with natural conversation text. The big downside to this is since it is done all in text, it can seem cumbersome and inefficient, and can take longer than voice in certain situations.
Voice User Interface (VUI)
An example of a VUI is something like Siri, where there is an audio cue that the device is listening, followed by a verbal response.
Most apps today combine a sense of both Graphical and Voice User Interfaces. For example, when using a maps application, you can use voice to search for destinations and the application will show you the most relevant results, placing the most important information at the top of the screen.
Some examples of popular smart assistants today are Alan, Amazon Alexa, Siri by Apple, and Google Voice Assistant.
Popular Voice Assistants
Siri is the most popular voice assistant today. Created in 2010 by SRI Inc, and purchased in 2011 by Apple, Siri has quickly become an integral part of the Apple ecosystem in bringing all the Apple devices and applications together to use in tandem with one another.
Created by Amazon in 2014, Alexa was named due to its similarity to the Library of Alexandria. Alexa was originally inspired by the conversational voice system found on board the U.S.S. Enterprise in Star Trek. Alexa was released alongside The Amazon Echo, a smart speaker intended for consumers to dive into the world of home automation, uses the Alexa platform to allow users to interact with the Amazon ecosystem and allow for a plethora of smart devices to be connected.
Originally unveiled in 2016, Google Assistant was the spiritual successor of Google Now, with the main improvement being the addition of two-way conversations. Where Google now would return answers in the form of a search results page on Google, Google Assistant gives answers in the form of natural sentences and returns recommendations in the form of Feature cards.
Beginning in 2009, Cortana by Microsoft has had one of the longest visions of giving people access to voice assistants in their daily lives. Microsoft began shipping Cortana with all Windows 10 and Xbox devices, leading to a huge increase in the amount of registered Cortana users. In 2018 it was reported that Cortana had over 800 Million users.
In 2017 Alan set out to take voice assistants to the next level, by enabling voice AI for all applications. Using domain specific language models and contextual understanding, Alan is focused on creating a new generation of Enterprise Voice AI applications. By using the Alan Platform, developers are able to take control of voice, and create an effective workflow that best fits their users with the help of vocal commands.
Future of Voice Assistants
As AI becomes more advanced and voice technology becomes more accepted, not only will voice controlled digital assistants become more natural, they will also become more integrated into more daily devices. Also, conversations will become much more natural, emulating human conversations, which will begin to introduce more complex task flows. More and more people are using voice assistants too, as it was estimated in early 2019 that 111.8 million people in the US will use a voice assistant at least monthly, up 9.5% from last year.
In the future, devices will be more integrated with voice, and it will become easier and easier to search using voice. For example, Amazon has already released a wall clock that comes enabled with Amazon Alexa, so you can ask it to set a timer or tell you the time. While these devices aren’t full blown voice activated personal assistants, they still show a lot of promise in the coming years. Using vocal commands, we will be able to work with our devices just by talking.
Currently, as users are getting more used to using voice to communicate with their digital devices, conversations can seem very broken and awkward. But in the future, as digital processing becomes quicker and people become more accustomed to using voice assistants in their everyday devices, we will see a shift where users won’t have to pause and wait for the voice assistant to catch up, and instead we will be able to have natural conversations with our voice assistants, creating a more soothing and natural experience.
More complex task flows
As conversations with voice assistants become more natural and voice recognition and digital processing becomes quicker, it won’t be uncommon to see users begin to adopt more advanced tasks in their daily routines with voice assistants. For example, instead of asking a voice assistant how long a commute is, and then asking about different options, you might be more inclined to say, “If Uber is quicker than taking the bus to work, can you reserve an Uber ride from home to work, and how long will it take?”
How to make your own voice assistant
As the amount of voice assistants available publicly begin to grow, tools are beginning to appear to create your own to make it as easy as possible to find a voice assistant that fits your needs!
For example, if you just wanted to create a specific skill, or command for a voice assistant, it might be more efficient to look into integrating a skill into an already existing voice assistant, such as Alexa.
Amazon has actually made it incredibly simple to add your own command to the vastly growing set of publicly available Alexa Skills. You can login to AWS with the same account you have an Echo linked to, and use the tools to create a free Alexa Skill!
Using Alan Studio, the completely browser based Voice AI IDE, you can develop, test, and push voice integration straight from your browser.
Alan is a highly customizable Voice AI platform designed to work with any pre-existing application. Built with enterprise use in mind, security and business functionality are a top priority. You can leverage visual and voice context to support any workflow and improve efficiency today, and since Alan is a completely browser based IDE, you can edit your scripts on the go whenever the need arises. Long gone are the days of creating multiple versions of scripts to run on each platform, with Alan, you can use a single script version and embed into any app, iOS, Android, or Web. You can sign up today for Alan Studio and see how you can create an AI voice assistant solution to improve your quality of life!
Voice Assistant Timeline
- 1922 – First Voice activated consumer product hits store shelves as “Radio Rex”
- 1952 – Audrey, or the Automatic Digit Recognition Machine, is announced
- 1962 – IBM Shoebox is shown for the first time at the State Fair
- 1971 – Darpa funds five years of speech recognition research and development
- 1976 – Harpy is shown at Carnegie Mellon
- 1984 – IBM releases “Tangora” the first voice activated typewriter
- 1990 – Dragon Dictate is released
- 1994 – Simon by IBM is the first modern voice assistant released
- 2010 – Siri is released as an app on the iOS app store
- 2011 – IBM Watson is released
- 2012 – Google Now is released
- 2014 – Amazon Alexa and Echo are released
- 2015 – Microsoft Cortana is released
- 2017 – Alan is developed and released with the Alan Platform