Monday, 2 March 2015

Hot Topics: Translation, Localization, Language Industry and Science



Twitter Chatter About Putin and Poroshenko: The Language Breakdown

Posted by Tetyana Lokot on February 19, 2015


When we started collecting our Twitter data last October, we were predominantly interested in what Russians and Ukrainians were saying about their presidents. But we decided to cast a wider net and collected all the tweets containing the last names of the heads of states in Russian (Путин and Порошенко), Ukrainian (Путін and Порошенко), and English (Putin and Poroshenko). We ended up with over six million tweets - 6,342,294, to be exact.  Once we had our data, we faced a problem: how can we tell when a Russian or Ukrainian tweets about Putin or Poroshenko as opposed to a Brit or a Korean? There are several attributes of tweets and Twitter accounts that help indicate a user's country and language. First, there is the location a user chooses to add to their profile. Then there is the language a user sets for their account and interface. Third, each tweet also has a language indicator, determined from the keyboard setting and tweet content. Finally, some users choose to turn on geolocation on their smartphones, and in this case... Link to the full article HERE.


---------------------------------------------------------------------------------------------------------------------------

 

Military could be using high-tech speech software by 2017
Posted by Ray Locker


WASHINGTON — The Pentagon could be able to listen in on voice communications in difficult environments and then quickly translate and transcribe them for use by intelligence analysts and combat troops by 2017, according to the Defense Advanced Research Projects Agency.
  
Newly released DARPA documents show it is continuing the next two stages of itsRobust Automatic Transcription of Speech program, which is aimed at separating speech from background noise, determining which language is being spoken and then isolating key words from that speech for analysis.
  
The Air Force, DARPA says, is testing the third phase of the program in the field now, while "the research division of a government agency will be testing the speech activity detection algorithm to incorporate into their platform." References to "a government agency" usually refer to a part of the intelligence community, such as... Link to the full article HERE

---------------------------------------------------------------------------------------------------------------------------



A language family tree - in pictures

Minna Sundberg’s illustration maps the relationships between Indo-European and Uralic languages. The creator of the webcomic Stand Still. Stay Silent, put the illustration together to show why some of the characters in her comic were able to understand each other despite speaking different languages. She wanted to show how closely related Swedish, Danish, Norwegian, Icelandic were to each other, and how Finnish came from distinct.. Link to the full article HERE

---------------------------------------------------------------------------------------------------------------------------



A massive database now translates news in 65 languages in real time
Posted by Derrick Harris on February 19, 2015


I have written quite a bit about GDELT (the Global Database of Events, Languages and Tone) over the past year, because I think it’s a great example of the type of ambitious project only made possible by the advent of cloud computing and big data systems. In a nutshell, it’s database of more than 250 million socioeconomic and geopolitical events and their metadata dating back to 1979, all stored (now) in Google’s cloud and available to analyze for free via Google BigQuery or... Link to the full article HERE

---------------------------------------------------------------------------------------------------------------------------


Twitter introduces hashtags in 13 Indian languages
Posted by Riddhi Mukherjee on February 23, 2015



Twitter has introduced hashtags in 13 Indian languages, including Hindi, Kannada, Marathi, Sanskrit, Nepali, Bengali, Assamese, Punjabi, Gujarati, Oriya, Tamil, Telugu and Malayalam.


Last month, Twitter had also introduced Hindi and Urdu translations for tweets through Microsoft’s Bing Translator. It’s worth noting that Twitter mentioned that the results will “vary and often fall below the accuracy and fluency of translations provided by a professional translator.” So, why didn’t they tie-up with... Link to the full article HERE