Seeing the world through AI

Insights

Seeing the world through AI

Exploring multi-modal language models

By

Ky Le

8

Oct

2024

The introduction of AI has greatly enhanced our ability to broaden our perspectives and empathize with other cultures by bridging cultural gaps through translation and historical insight.

Before AI could help us translate, I was my family’s translator. Growing up in Vietnam, my parents thought it would be beneficial for their children to learn English, so they enrolled us in English classes starting in kindergarten. Now that we’re bilingual, my siblings and I have become our parents’ translators.

This responsibility has only grown since my English partner joined our family. My mother doesn’t speak or understand English, and my partner doesn't understand Vietnamese. As a result, I’ve become the bridge at every family event to help them connect. Fortunately, a new era of translation may finally be upon us.

Using LLMs for translation

On May 13th, Open AI released GPT 4o, demonstrating the model’s advanced speech capabilities and ability to translate in real time. I couldn’t wait to try it out at thirteen23. Here's a video of GPT acting as an intermediary translator, translating a conversation from Vietnamese to English.

While language translation technology dates back to the 1930s, few realize that artificial intelligence played a role in its early development. Early machine translation techniques, such as Neural Machine Translation (NMT) and Statistical Machine Translation (SMT), generated translations one word at a time. However, these methods often failed to accurately capture the nuances of native languages.

Not long ago, I had difficulty getting Timekettle to translate for my family. While more recent, this device also struggled to keep up with our speech and was often grammatically incorrect. As a result, the conversation felt unnatural and slow. GPT 4o's ability to converse and translate like a human is much more promising, increasing the chances that cross-cultural connections will feel more natural.

Translating across modalities

Language is the primary source of human connection. People connect through a variety of modalities, including speech, writing, and gesture. Verbal communication can be especially challenging because it typically occurs in real time.

Recently, Meta partnered with Rayban to create a pair of sunglasses that use AR to capture images, detect locations, and perform real-time translation. Similarly, Google’s concept glasses bridge communication barriers by translating and displaying subtitles as users speak.

While both concepts use large language models, I'm particularly excited about their form factor. Not only do they support multiple modalities, they're also more inclusive.

A collage showing a man wearing in-ear Google headphones, a woman wearing brown Ray-Ban speaker sunglasses, and a text blurb that translates “Hi, where would you like to go?” from English to Vietnamese.

As we’ve explored other scenarios where large language models can deepen our understanding of different cultures, we've also investigated ChatGPT's ability to "see" by providing contextual insights for any image.

Imagine trying to immerse yourself in a new culture. ChatGPT can now help by translating menus, converting currencies, and identifying specific dishes in photos, even going so far as to give you recipes!

The future of translation

It’s exciting to see how AI is bringing more inclusivity and accessibility to the industry. As mentioned in our article Lost in Translation, AI promises to change the way we interact with a global audience by making content more accessible than ever before. This shift will open up new possibilities for designers, encouraging them to think creatively about how to integrate AI, hopefully in ways that enrich our culture, not diminish it.

As AI begins to provide instant translations, we may see a shift in how cultural knowledge is acquired and shared. So much so that it will be important to remember the benefits of traditional language learning. My own journey of learning a new language has given me a deeper understanding of both another culture and my own. While I'll continue to be the translator for my family, I'm keeping an eye on advances in AI in hopes of being even more present with my family in the future.

Chúc bạn một ngày tốt lành!

Have an idea or interested in learning more? Feel free to reach out to us on Instagram or Twitter!

Special thanks to Anna Glenn and Morgan Gerber

Ky Le

Ky Le

Designer

Ky Le is a Designer whose anatomical design thinking helps create design systems and user experiences for partners like Bose, Cigent, and Visa. A native of Vietnam, she enjoys baking birthday cakes, watching food competitions, and hanging out with her dog, Axel, at food trucks.

Around the studio