Blog Posts | jasonrichdarmawan

Latest

How to reduce VRAM peak usage, and increase Volatile GPU-Util?

TLDR; split the intermediate matrix to fit into the GPU L2 Cache

Jason Rich Darmawan

Multi-head Attention is the same as a Linear transformation with less computation

Multi-head Attention is the same as a Linear transformation with less computation. 1) Multi-head Attention is the same as Linear transformation because it has the same property of "Every output depend...

Apr 10, 2025 Large Language Model

Jason Rich Darmawan

Business value of Artificial Intelligence in classifying Fruits.

The main message is why building AI to classify fruits have business value The target audience is The management, so no jargon used in the presentation. The presentation overused animation to make it ...

May 23, 2024 Blog

1
2