onTS: Text Rendering with Typography and Style Controls

[ICCV 2025]

Wenda Shi¹, Yiren Song², Dengming Zhang³, Jiaming Liu⁴, Xingxing Zou^1,*

¹The Hong Kong Polytechnic University, ²National University of Singapore, ³Zhejiang University, ⁴Tiamat AI ^*Corresponding author

Paper Code Model Dataset

Text rendering with typography and style controls. The desired style is indicated by an image, and the prompt defines the text content, including font and word-level attributes. The modifier token— and for bold, and for italic, and for underline—enclosed word to denote the application of effects. Results show that our method effectively supports (a) word-level control and style control, (b) style control only, (c) word-level control without compromising the performance of scene text rendering.

Abstract

Our proposed pipeline trains distinct components for different objectives to achieve uniquely balance between the content accuracy and stylization. The proposed parameter-efficient fine-tuning method with enclosing typography control tokens (ETC-tokens), provides word-level controls under resource constraints. Meanwhile, style control adapters training overcomes the content leakage in style control.

Through extensive experiments, we demonstrate that FonTS achieves superior typography and style control while maintaining compatibility with various existing pipelines, including artistic text rendering, scene text rendering, and basic text rendering frameworks.

Framework Overview

Framework Overview: In the training phase, (a) illustrates the typography control (TC)-finetuning with paired TC-datasets, and (b) presents the training process for style control adapters (SCA). For inference, (c) shows the integrated operation of the TC-finetuned backbone and the SCA. For simplicity, we have not depicted CLIP in the figure. The prompt in (a) is '<b*>Find</b*> your path in Font: <font:3>.', and prompt in (b) is 'Artistic Text: 'Jade', the letters are composed of jade, 3d render, minimalist, high resolution, typography'.

Qualitative Comparison

Qualitative comparison of style consistency and content accuracy in artistic text rendering against baselines. For all rows except the last row, the input consists of a text prompt along with style images on the top-left. In the top three rows, the text prompts are just simple captions "Text:'Word'", while for others are style captions.

Artistic Letters with Different Styles

The results of artistic letters with different styles.

Logo Design Applications

The logo design of stylized scene text image with artistic text images and different image scales.

More Artistic Text Rendering Results

More qualitative results of ours on artistic text rendering.

Results Across Different Rendering Types

Results of our method: (a), (b) and (c) in basic text rendering, artistic text rendering, and scene text rendering, respectively.

Series Work

Latest Work: WordCon: Word-level Typography Control in Scene Text Rendering (arXiv25)

BibTeX

@article{shi2024fonts,
      title={FonTS: Text Rendering with Typography and Style Controls},
      author={Shi, Wenda and Song, Yiren and Zhang, Dengming and Liu, Jiaming and Zou, Xingxing},
      journal={arXiv preprint arXiv:2412.00136},
      year={2024},
}