In my last post I ranted a bit about Apple, their new iOS design and their place within the changing interaction ecosystem. In this post I want to focus on the future of interaction and interfaces: Where are we headed and why and will it be better than today?
”If you want to know where technology is headed,
look at how artists and criminals are using it.”
– William Gibson
If you look at current and past science fiction movies some elements regarding the interaction with computers are reoccurring: voice commands, hand gestures and 3D navigation. The first two elements are well on their way in todays interaction environment but the third is remarkebly non present in itself, though todays multitasking layer environment in computers can be looked upon as semi-3D.
But let’s examine the first two elements in some more depth:
1. Voice Controlled Devices (VCD)
The past 20 years have introduced everything from washing machines that allow consumers to operate washing controls through vocal commands and mobile phones with voice-activated dialing. The new and modern VCDs are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences (instead of thoroughly analyzing one voice through different test sentences). They are also capable of responding to several commands at once, separating vocal messages, and providing ”appropriate” feedback, trying to imitate a natural conversation. VCDs can be found in computer operating systems (Windows, Mac OSX, Android), commercial software for computers, mobile phones (iOS, Windows Phone, Android Phone, BlackBerry), cars (Ford, Chrysler, Honda, Lexus, GM), call centers ”agents”, and internet search engines such as Google.
Among the future cross platform players are Google who has created a voice recognition engine called Pico TTS and Apple that has released Siri. Apple’s use of Siri in iPhone and Googles use of speech-recognition in for example Google Glass has not been received without sarcasm or frustration. Both give you the possibility to give a set of commands: dictate, google/search for information, get direction, send email/message/tweet, open apps and set reminder/meeting.
Siri hasn’t been as big a success as anticipated, mostly because of issues with Siri not understanding your commands correctly. But Siri’s technical solution is not an easy one. It is built up by two parts: the virtual assistant and the speech-recognition software (made by Nuance). The assistant actually works pretty good while the speech-recognition engine works…occasionally. This has got to do with how the different parts are interacting and also the quality and speed that the actual sound file can be delivered to the online speech-recognition engine that then will have to send the text back to your phone for the virtual assistant to act on. Sound complicated? Basically if you articulate well while you’re connected to Wi-Fi you should be well off. In the future – apart from improving Siri – Nuance has mentioned developing advanced voice recognition software for use in cars (Dragon Drive) for getting direction, searching for nearby restaurant, but also within TVs (Dragon TV).
Among other prominent devices the introduction of voice commands was given a lot of room when Microsoft revealed the new Xbox One. Voice is used for starting, ending and switching between different services but also for giving specific commands within games.
So this is the present situation but wherein lies the future for voice commands? Vlad Sejnoha, chief technology officer of Nuance Communications, believes that within a few years, mobile voice interfaces will be much more pervasive and powerful. “I should just be able to talk to it without touching it,” he says in an article in Technology Review. “It will constantly be listening for trigger words, and will just do it — pop up a calendar, or ready a text message, or a browser that’s navigated to where you want to go.”
This future scenario sounds both intriguing and disturbing. A silent spying assistant that is always on call, ready to do your bidding even before you ask for it. Hopefully the privacy and security settings will be as well developed and intelligent as the voice-recognition itself.
2. Gestures UI
Kevin Kelly, founder of Wired Magazine but also technical consultant to the fictional interfaces designed by Jorge Almeida for the iconic movie Minority Report, recently gave a speech where he described the future impact of different disruptive technologies. Among them where gesture based interaction, Gestures UI, something that was featured in Minority Report when Tom Cruise orchestrated rather than navigated and clicked to find information within a computer (Tom Cruise interaction was by the way a lot more realistic than Keanu Reeves quite ridiculous 3D/VR-glove attempts in Johnny Mnemonic). Kelly states that as screens and displays can be anything and everywhere – something he also managed to put into Minority Report – an easy and accessible way would be using a type of sign language rather than typing your commands. Kelly gives Eye Tracking as an example of current existing technology that scans your body language to get information. Eye tracking could also be used to identify your mood and level of interest and adapt the presentation according to that, for example noticing that you don’t understand a word and subtly explain it to you. Iris identification software could also be used to identify persons at a larger extent than today, possibly even for advertising purposes.
From the gaming industry PlayStation and Xbox has introduced some gesture-based features and developed them further with their next-generation consoles which contain even more possibilities for commands, navigation and in-game interaction through gestures.
The touch based revolution that Apple initiated has been a great building block to prepare the public for even more physical future interaction patterns.
The two disruptive interaction technologies described above will change the design of interfaces massively. For example: when using voice based interaction an effective, intelligent and attentive servant could remove the need of an actual interface or menu. Gestures on the other hand would get us closer and more active which would require a totally different approach. Xbox One combines both features and will be an interesting experience.
Both technologies could restore some of our humanity within the digital environment when we’ll use human language – voice-based or body-based – as the primary tool for interaction with machines.