PixelAural
A multimodal application that transforms images into spatial sound narratives.
PixеlAural is a software tool that addresses the gap in accessible spatial audio tools.
A noticeable discrepancy in storytelling mediums еxists in a timе when tеxt and imagе-basеd narrativеs have dominatеd popular culturе. This is bеcausе thеrе arе rеlativеly fеw accеssiblе tools for auditory rеprеsеntation, еspеcially in spatial audio. For crеators looking to incorporatе immеrsivе audio еlеmеnts into thеir storiеs—a vital componеnt in thе dеvеlopmеnt of storytеlling—this gap posеs a formidablе challеngе. With its innovativе approach to bridging thе gap bеtwееn thе visual and auditory domains, PixеlAural еmеrgеs rеdеfinеs auditory storytеlling through immеrsivе virtual production.
Collaborator:
Selin Dursun(Harvard MDes)
Merve Akdogan (MIT SMArchS)
Instructor: Jose Luis Garcia del Castillo Lopez
2024
Harvard University Graduate School of Design
?
How can we enhance the way people experience and share their memories?
Preserve the immersive, interactive spacial sound information
that may invoke a deeper emotional connection.
Findings
Spatial sound and enhance memories


Unlike traditional photos or videos, sound can capture the ambiance and emotional nuances of a moment. PixelAural leverages this by enabling users to create detailed soundscapes that represent their memories, making recollections more vivid and emotionally engaging.
Pipeline and Workflow
Interpret Image into 3D sound experience

Multi-model decode to encode system

?
Why HRTF, Why not VR?
We use HRTF to create a web-based solution that can be experienced on any edge devices that support spatial audio. It's important that our design is accessible to most users without bringing them extra tech-burden.

The HRTF spatial sound system code begins by setting up an audio context and initializing the HRTF environment using High Fidelity Audio Nodes. It then loads and decodes sound files into audio buffers. Each sound source is assigned specific 3D coordinates for accurate spatial positioning. HRTF input nodes filter the sounds based on their position, simulating how sound waves interact with the listener's head and ears for a realistic audio experience. The system dynamically updates sound positions in real-time based on user interactions and manages playback control, creating an immersive auditory experience that responds to the listener’s environment.
Prototypes
A web app prototype based on ESC-50 sound dataset
User interface of image decoding and sounds matching.
User interface of sound position and volum encoding.