When we started
collecting our Twitter data last October, we were predominantly interested in
what Russians and Ukrainians were saying about their presidents. But we decided
to cast a wider net and collected all the tweets containing the last names of
the heads of states in Russian (Путин and Порошенко), Ukrainian (Путін and
Порошенко), and English (Putin and Poroshenko). We ended up with over six
million tweets - 6,342,294, to be exact.
Once we had our data, we faced a problem: how can we tell when a Russian or
Ukrainian tweets about Putin or Poroshenko as opposed to a Brit or a Korean?
There are several attributes of tweets and Twitter accounts that help indicate
a user's country and language. First, there is the location a user chooses to
add to their profile. Then there is the language a user sets for their account
and interface. Third, each tweet also has a language indicator, determined from
the keyboard setting and tweet content. Finally, some users choose to turn on
geolocation on their smartphones, and in this case... Link to the full articleHERE.
Military could be using high-tech speech software by 2017 Posted by Ray Locker
WASHINGTON — The Pentagon could be able to listen in on voice communications in difficult environments and then quickly translate and transcribe them for use by intelligence analysts and combat troops by 2017, according to the Defense Advanced Research Projects Agency.
Newly released DARPA documents show it is continuing the next two stages of itsRobust Automatic Transcription of Speech program, which is aimed at separating speech from background noise, determining which language is being spoken and then isolating key words from that speech for analysis.
The Air Force, DARPA says, is testing the third phase of the program in the field now, while "the research division of a government agency will be testing the speech activity detection algorithm to incorporate into their platform." References to "a government agency" usually refer to a part of the intelligence community, such as... Link to the full articleHERE
Minna Sundberg’s illustration maps the relationships between Indo-European and Uralic languages. The creator of the webcomic Stand Still. Stay Silent, put the illustration together to show why some of the characters in her comic were able to understand each other despite speaking different languages. She wanted to show how closely related Swedish, Danish, Norwegian, Icelandic were to each other, and how Finnish came from distinct.. Link to the full articleHERE
I have written quite a bit about GDELT (the Global Database of Events, Languages and Tone) over the past year, because I think it’s a great example of the type of ambitious project only made possible by the advent of cloud computing and big data systems. In a nutshell, it’s database of more than 250 million socioeconomic and geopolitical events and their metadata dating back to 1979, all stored (now) in Google’s cloud and available to analyze for free via Google BigQuery or... Link to the full articleHERE
Twitter has introduced hashtags in 13 Indian languages, including Hindi, Kannada, Marathi, Sanskrit, Nepali, Bengali, Assamese, Punjabi, Gujarati, Oriya, Tamil, Telugu and Malayalam.
Last month, Twitter had also introduced Hindi and Urdu translations for tweets through Microsoft’s Bing Translator. It’s worth noting that Twitter mentioned that the results will “vary and often fall below the accuracy and fluency of translations provided by a professional translator.” So, why didn’t they tie-up with... Link to the full articleHERE