Senior Intel Research Scientist Dr. Ilke Demir shares her thoughts about the future of metaverse, artificial intelligence development and media, in an interview with TRT World.
Dr. Ilke Demir is a Senior Staff Research Scientist at Intel Corporation, where she works on responsible artificial intelligence and metaverse, and is tasked with developing new solutions for old challenges in computer vision, deep learning and 3D vision.
We spoke to Dr. Demir on the sidelines of ‘TRT Metaverse & Broadcasting Forum’ held in Istanbul on June 10.
TRT World: What’s the current major focus for you and your team at Intel?
Ilke Demir: AI responsibility is a priority for us, especially in the field of media. In our work, we ask how we can increase trust in the media as deep fakes become a larger problem, and more importantly how we can detect them.
Recent years have seen significant changes to the AI industry in terms of growth and method. Could you give us an idea of what major changes the domain is seeing with next-gen AI infrastructure?
ID: I would say that multimodal approaches are facing harder challenges at solving problems in complex spaces. We’re seeing solid progress with regenerative models, and in 3D shape understanding methods. Natural language processing (NLP) is also undergoing significant growth.
One of the more exciting developments in the field is the use of multimodal neural networks that combine vision and natural language processing, for instance, or 2D and 3D with better outputs, insights and information distribution. It’s the next level.
Is the metaverse already here?
ID: In 1994, my former post-doctoral supervisor wrote a paper about the office of the future, describing augmented, virtual and mixed reality in the workplace. As you know, Microsoft has Teams for the Metaverse. Facebook had Spaces.
If we already knew it was coming 20 years ago, why didn’t it happen? Primarily because we lacked the bandwidth, hardware and form factor. VR hardware has come a long way thanks to 5G and wi-fi, and features better rendering and resolution. Motion tracking is also on the rise. Between hardware, AI and implementations of the metaverse, we can actually use the Metaverse now.
Do you expect any changes as younger generations grow up connected to the Internet and metaverse?
ID: I think the most important change will be in how we interact with technology. With the keyboard, input was letter by letter. The mouse helped, of course.
These devices are going away however, redefining what our hands and eyes can do instead. I expect more tracking, where eye movement could scroll a page for instance.
The new generation growing up in this immersive opinion will be much more precise with their movements as they compensate so that machines can understand them, while machines will also compensate in order to create a space for self-expression. I think we’ll be evolving in that direction.
Could you share three forecasts or concerns on AI and media, going forward into the future?
ID: First, deepfakes are coming, and they are coming strong. The more they spread online, the more difficult it is to believe anything online, leading to an erosion of trust not only in media, but socially as well.
If you reshare a deepfake from someone you trust, and find out later that it’s fake, it could have a social impact. Even in cases of media companies, we are rapidly approaching the point where efforts are needed for fact-checking, and vetting third-party material for deep-fakes.
Down the road, media companies could choose not to use third-party content, and instead rely on their own in-house content for security. The arms race between deep fake generation and detection will only be more important in the near future.
Second, media production is also headed towards 3D, as resolution and immersive environments continue to improve. If you’re using real-world capture, and not using a cartoonish avatar on the metaverse, a lot of detail can be included down to hair and even wrinkles. While this will be very important for media production, people will have to answer questions like how much detail is too much?
Lastly, interactions. Even though we have the technology for hand, eye and body tracking, most times these actions are not translated in the 3D world. For example, there’s no consensus on what blinking or hand gestures, like pinching, do in a 3D environment, unlike the agreement on what traditional keyboard and mouse interfaces do. An important upcoming milestone for making the metaverse work for us is combining and collating all those interactions into one universal interface
In this context, what’s a challenge facing metaverse adoption currently?
ID: Time spent on interface is critical for mass adoption. If you need to spend a lot of time interfacing, the platform will probably never be the next Facebook. That’s why we are trying to simplify all possible interactions.
Artificial intelligence has been used successfully to upscale video quality. Do you expect that AI will eventually close the gap you’re trying to bridge between 3D and natural language processing by itself, or does it remain a human-centric effort?
ID: For so long as the creative aspect is necessary, a human still needs to be in the loop, because our multimodal neural networks are still not that good at storytelling. You can create a 3D model, and animate it randomly, but without a story it’s just another walking model.
Without a story, background and motivation, could it ever be believable to humans?