To improve the functionality of Cortana, its voice-powered virtual assistant, Microsoft is turning to bilinguals. Of specific interest is the practice of code switching—when speakers switch back and forth between multiple languages in a single sentence or conversation.Project Mélange, run by Microsoft researchers in India, is studying the use of code switching among Indians online. They are trying to figure out how virtual assistants might be taught to respond to a user switching between, for example, English and Hindi in a conversation.“Compartmentalization of mixed languages (by multilinguals) have gone away with each coming generation,” Kalika Bali, a researcher at Microsoft, said in an interview. “So younger people use mixed languages in more and more phases of their lives.”“To have a digital assistant, something like Cortana, you have to be able to understand [the user base],” she said, adding current systems were not trained to pick up multiple languages in a single conversation.Bali said the researchers’ code-switching study was inspired by observations from an anthropologist who was looking at the use of technology among Urdu-speaking youth in the poor areas of Hyderabad in southern India. They were found using the internet to befriend and interact with girls from Brazil, who primarily spoke Portuguese. However, English was the primary language for that communication, which interested Bali.“When I started seeing this data, I saw that not only were they using this very pidgin English kind of thing, but [they were] effectively communicating with each other,” Bali said.The team looks at every aspect of code mixing, including text, speech, understanding, and recognition. Bali said they also look at generational variations and why people switch between languages in a single conversation—for example, sometimes it is for humor and at other times it is to change topics.It will take years before a virtual assistant can eloquently switch back and forth in multiple languages, but the biggest challenge now is getting access to adequate data sets for study.Currently, the team uses data collected from Twitter to study how users switch between languages. Studies, Bali said, have already shown Indian men who spoke English and Hindi tended to switch to the latter when they had to express negative sentiments or abuse. Women who were having conversations in English, however, tended to stick with that language even when the content took a negative turn.Teaching machines to interpret code switching could also potentially lead to developments in areas of opinion mining, customization, and a better interpretation of nuances and context, Bali said.