top of page

Naoya Takahashi, PhD

Naoya Takahashi is a senior research scientist at Sony. He received PhD in Computer Science from the University of Tsukuba, Japan. Formerly, he worked at the Computer Vision Lab and Speech Processing Group at ETH Zurich, Switzerland as a Visiting Researcher. In 2018, he received Sony Outstanding Engineer Award: the highest form of individual recognition for Sony Group engineers. He has publications at top tier conferences including CVPR, ICLR, and ICASSP. His research interests lie in the field of machine learning and its applications including multi-modal event recognition, highlight detection, style-transfer, source separation, speech recognition, and voice conversion. 

  • LinkedIn Social Icon
  • Twitter Social Icon
  • 2762267_orig

Experience

Selected papers

  1. Hao-Wen Dong, Naoya Takahashi*, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick, “CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos,” International Conference on Learning Representations (ICLR), 2023, *corresponding author [OpenReview][arXiv][demo][code]

  2. Naoya Takahashi, Yuki Mitsufuji, “Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks,”  IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [CVF][IEEE][arXiv][code]

  3. Naoya Takahashi, Michael Gygli, Luc Van Gool, "AENet: Learning deep audio features for video analysis", IEEE Transactions on Multimedia (Trans. MM), Vol.20 Issue 3, 2017 [IEEE][arXiv]

  4. Naoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool, "Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition" Interspeech 2016 [arXiv]

Awards

  • Local Commendation for Invention Award 2022, Kanto region, Encouragement Award, Audio Source Separation Technology using multiple AI-models [URL]

  • Sony Outstanding Engineer Award, 2018: the highest form of individual recognition for Sony Group engineers [URL]

  • Ranked 1st at the international challenge on “Detection and Classification of Acoustic Scenes and Events (DCASE) 2021”, task 3, Sound Event Localization and Detection with Directional Interference [URL]

  • Ranked 1st at the international challenge on music separation, SiSEC 2018, presented in LVA/ICA 2018 [URL]

  • Ranked 1st at the international challenge on music separation, SiSEC 2016, presented in LVA/ICA 2016 [URL]

Invited talk

                       

  • MIRU 2021 “D3Net: Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks” [URL]

  • Queen Mary University of London, 2021, “Audio Source Separation: Industry No.1 technology and its applications”

  • Tokyo BISH Bash #3, 2020, “Audio Source Searation”, [URL]

  • Open Data Science Conference Inida (ODSC) 2018, “Audiotory intelligence ~beyond speech recognition~” [video]

Reserach

multi-modal event recognition, highlight detection, style-transfer, source separation, acoustic event detection, voice conversion, speech recognition

bottom of page