Sasank Chilamkurthy’s Post

View profile for Sasank Chilamkurthy

JOHNAIC | Qure.ai | PyTorch | AI

We have shipped on-premise ChatGPT. We have started our journey with a tiny node JOHNAIC 16 late last year. But our customers turn out to be pretty demanding: they're making us grow by giving more intense requirements. We have designed two new systems: 1. JOHNAIC 140: 7 x 20 GB GPU machine suitable for onprem AI inference. Can run OpenAI's GPT-OSS-120b with 100 concurrent users. The model is as good or better than GPT-4o. Even after this deployment of this model, we have GPU RAM for embeddings, speech to text and text to speech. 2. JOHNAIC DataBank 64: 64 TB of high performance NVMe storage. The storage can reach in-memory performance! We've benchmarked and achieved 64 GB/s disk read speeds. We have deployed highly available postgres in two of these nodes. Idea is that JOHNAIC 140 should be able to generate SQL queries for the data stored in Databank 64. A private coding assistant is also made available with the OpenAI compatabile API deployed on the cluster.

  • No alternative text description for this image
Vagmi Mudumbai 🌈

CTO, Urai - Building ML/AI products

2w

This is amazing. It is unfair to just post this photo without the config. 🤓 Give us the deets. :) 7x20 GB is impressive. It is particularly good for MoE models like the gpt-oss-120b. Is it a cuda enabled card? Some of the newer attention mechanisms use fp8 cores to drive some amazing performance on vLLM. How do you cool this and how much power does this draw? Are these blower style cards? Isn't it better to have storage and computer separated? Wouldn't the NVME and gpus compete for the same pcie lanes? The latency from the model would any way be higher than the latency between a network connection especially if you can colocate them on the same rack. 🤔

Keval A.

Finance Professional | Passionate About Growth in UAE & India

3d

Now, I am Imagining with this great power, what could be the possibility, and it's mind boggling !

Like
Reply

Quite a big milestone, is this for someone in a regulated industry?

Like
Reply
Nikhil Shinde

ML Engineer | Specialising in LLMs, MLOps & Scalable AI | MS AI @QMUL | NLP · CV · Speech | AWS · K8s · Hugging Face

2w

Amazing work! 👏 Truly impressive to see on-prem AI at this scale. Quick question, what strategies are you using to optimize memory and GPU utilization to support 100 concurrent users on GPT-OSS-120b?

CA Nitish Reddy

Chartered Accountant | ICANN81 NextGEN Fellow| Startup Advisory | Valuation| NRI & NRO Financial Services Expert | Financial Reporting & Audit Consultant

2w

Sasank Chilamkurthy sounds crazy! What does this cost?

Like
Reply
Mohamed Faheem Thanveer

Building scalable AI solutions for enterprises/Master of Computer Vision/UoS UK/Enginner/NIT Trichy.

2w
Siddharth Joshi, PMP

Business Development I Pharma & CRO I Generative AI I MBA - IIM Ahmedabad I Indian Navy Veteran

2w

Impressive.

See more comments

To view or add a comment, sign in

Explore content categories