Junbum Cha

AI Research Scientist

Kakao Brain

scriiv [AT] gmail.com

About

Hello! I'm applied research scientist at Kakao Brain, working on various machine learning models for real-world applications. My interest lies in building robust machine learning (ML) system excel in practical environments.

One of the key challenges in real-world ML systems is dealing with out-of-distribution data. My research interests lie in addressing this out-of-distribution problem, regardless of the specific domain. Specifically, I have worked on few-shot adaptation [DMFont, LFFont, MXFont], out-of-domain generalization [SWAD, MIRO], open-world segmentation and detection [TCL, PLAC], and multimodal large language models [Honeybee].

I am also passionate about engineering and strive to write readable, well-structured, and optimized code to develop sustainable and scalable systems. I have actively contributed to various product-oriented projects from both modeling and engineering perspectives, such as Handol, font generation, OCR, and KARA.

Publications

Most recent publications on Google Scholar.
* indicates equal contribution.

Honeybee: Locality-enhanced Projector for Multimodal LLM

Junbum Cha*, Wooyoung Kang*, Jonghwan Mun*, Byungseok Roh

CVPR, 2024 (Highlight).

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Junbum Cha, Jonghwan Mun, Byungseok Roh

CVPR, 2023.

Domain Generalization by Mutual-Information Regularization with Pre-trained Models

Junbum Cha, Kyungjae Lee, Sungrae Park, Sanghyuk Chun

ECCV, 2022.

SWAD: Domain Generalization by Seeking Flat Minima

Junbum Cha, Sanghyuk Chun*, Kyungjae Lee*, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, Sungrae Park

NeurIPS, 2021.

Few-shot Compositional Font Generation with Dual Memory

Junbum Cha, Sanghyuk Chun, Gayoung Lee, Bado Lee, Seonghyeon Kim, Hwalsuk Lee

ECCV, 2020.

Honeybee: Locality-enhanced Projector for Multimodal LLM

Junbum Cha*, Wooyoung Kang*, Jonghwan Mun*, Byungseok Roh

CVPR, 2024 (Highlight).

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

Arxiv, 2023.

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Junbum Cha, Jonghwan Mun, Byungseok Roh

CVPR, 2023.

Domain Generalization by Mutual-Information Regularization with Pre-trained Models

Junbum Cha, Kyungjae Lee, Sungrae Park, Sanghyuk Chun

ECCV, 2022.

Few-shot Font Generation with Weakly Supervised Localized Representations

Song Park*, Sanghyuk Chun*, Junbum Cha, Bado Lee, Hyunjung Shim

TPAMI, 2022.

SWAD: Domain Generalization by Seeking Flat Minima

Junbum Cha, Sanghyuk Chun*, Kyungjae Lee*, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, Sungrae Park

NeurIPS, 2021.

HoughCL: Finding Better Positive Pairs In Dense Self-supervised Learning

Yunsung Lee, Teakgyu Hong, Han-Cheol Cho, Junbum Cha, Seungryong Kim

ICMLW, 2021

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts.

Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim

ICCV, 2021.

Few-shot Font Generation with Localized Style Representations and Factorization

Song Park*, Sanghyuk Chun*, Junbum Cha, Bado Lee, Hyunjung Shim

AAAI, 2021.

Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model

Sungrae Park, Geewook Kim, Junyeop Lee, Junbum Cha, Ji-Hoon Kim, Hwalsuk Lee

COLING, 2020.

Few-shot Compositional Font Generation with Dual Memory

Junbum Cha, Sanghyuk Chun, Gayoung Lee, Bado Lee, Seonghyeon Kim, Hwalsuk Lee

ECCV, 2020.

Toward High-quality Few-shot Font Generation with Dual Memory

Junbum Cha, Sanghyuk Chun, Gayoung Lee, Bado Lee, Seonghyeon Kim, Hwalsuk Lee

CVPRW, 2020.

GRiD: Gathering Rich Data from PubMed Using One-class SVM

Junbum Cha, Jeongwoo Kim, Sanghyun Park

IEEE International Conference on Systems, Man and Cybernetics (SMC). 2016.

A method for obtaining rich data from PubMed using SVM

Junbum Cha, Jeongwoo Kim, Yunku Yeu, Sanghyun Park

ACM Symposium On Applied Computing (SAC), 2016.

Projects

Product-oriented and non-published projects.
Published academic projects are not included here (See publications).

KARA

Jun 2023 - Present

KARA (KAkao brain Radiologic Assistant) is an AI-powered project that generates radiology reports from chest X-rays or CT scans, aiming to alleviate the shortage of radiologists. The KARA-CXR model, tailored for chest X-rays, exhibits performance competitive with junior radiologists and significantly surpasses general large multimodal models like GPT-4V in generating accurate and coherent reports.

CLOVA OCR

Sep 2020 - Jul 2021

The goal of our team is to create a world-leading OCR engine. My main contribution lies in the development of the scene text recognition engine. By effectively tackling the out-of-distribution problem, our OCR system effectively addresses real-world scenarios. As a result, our model achieves unparalleled performance in Korean, Chinese, and Japanese, which are the primary languages aligned with our business objectives.

Handwritten Font Generation

Mar 2019 - Dec 2020

Korean fonts are composed of 11,172 letters, even not including other symbols, which makes creating a new one a labor-intensive and expensive task. This project has revolutionized the process by using deep generative models, specifically generative adversarial networks (GANs), to generate new fonts with almost 0.1% of the effort. This project opens the era of personalized Korean fonts where anyone can have their own font.

Handol

2018 - Feb 2019

Handol is a project aiming at a superhuman-level go AI for business use. We've successfully reproduced AlphaGo, AlphaGo Zero, and AlphaZero, and our system achieved the runner-up place in the world go AI competition. We've also won several matches against professional players, including handicap matches. Using this system, we offered additional services such as AI odds prediction, AI hints, and AI matches on the company's Go game platform. We've also expanded the system to create automated level design for various turn-based games beyond go.

KARA

Jun 2023 - Present

KARA (KAkao brain Radiologic Assistant) is an AI-powered project that generates radiology reports from chest X-rays or CT scans, aiming to alleviate the shortage of radiologists. The KARA-CXR model, tailored for chest X-rays, exhibits performance competitive with junior radiologists and significantly surpasses general large multimodal models like GPT-4V in generating accurate and coherent reports.

Explainable AI for Trustworthy Healthcare System

Apr 2023 - Jun 2023

Reliability is crucial in healthcare systems. This project enhances reliability by introducing explainability, which helps clarify the reasoning behind a model's predictions. More specifically, we focus on our medical reports generation model, which generates reports based on medical images. Our aim is to provide a visual representation that highlights the specific areas of the image upon which the generated report relies.

CLOVA OCR

Sep 2020 - Jul 2021

The goal of our team is to create a world-leading OCR engine. My main contribution lies in the development of the scene text recognition engine. By effectively tackling the out-of-distribution problem, our OCR system effectively addresses real-world scenarios. As a result, our model achieves unparalleled performance in Korean, Chinese, and Japanese, which are the primary languages aligned with our business objectives.

Handwritten Font Generation

Mar 2019 - Dec 2020

Korean fonts are composed of 11,172 letters, even not including other symbols, which makes creating a new one a labor-intensive and expensive task. This project has revolutionized the process by using deep generative models, specifically generative adversarial networks (GANs), to generate new fonts with almost 0.1% of the effort. This project opens the era of personalized Korean fonts where anyone can have their own font.

Handol

2018 - Feb 2019

Handol is a project aiming at a superhuman-level go AI for business use. We've successfully reproduced AlphaGo, AlphaGo Zero, and AlphaZero, and our system achieved the runner-up place in the world go AI competition. We've also won several matches against professional players, including handicap matches. Using this system, we offered additional services such as AI odds prediction, AI hints, and AI matches on the company's Go game platform. We've also expanded the system to create automated level design for various turn-based games beyond go.

Duplicated Image Search

2017

This project aims to cluster duplicate images within user-uploaded images on a cloud service. Specifically, a duplicate image is defined as any image that has been manipulated in some way, such as applying photo effect filters, adding stickers, resizing, or cropping. We utilized a SIFT-based image correspondence algorithm for image matching and an approximated kNN algorithm for search efficiency.

GANs Comparison without Cherry-picking

2017

The motivation behind this project is to tackle the issue of overestimation resulting from cherry-picking in generative adversarial networks (GANs). In 2017, GANs were the most trending topic in deep learning, yet the absence of quantitative evaluation metrics led to a proliferation of diverse GAN algorithms. Each algorithm claimed superiority over others based on non-objective qualitative evaluations. To address this, this project offers open-source implementations of various GAN algorithms and provides an unbiased and objective comparison without cherry-picking.

iOS Applications

2012 - 2015

I began my career as a software engineer. Within the first three years, I co-founded a start-up and developed 9 iOS applications. Out of these 9 applications, I contributed to 4 of them across all stages of productization, including initial ideation, product planning, business modeling, design, marketing, sales, and operation, beyond just application development.

Vitæ