Simplismart’s cover photo
Simplismart

Simplismart

Software Development

Fastest inference with terraform-like orchestration

About us

Fastest inference for generative AI workloads. Simplified orchestration via a declarative language similar to terraform. Deploy any open-source model and take advantage of Simplismart’s optimised serving. With a growing quantum of workloads, one size does not fit all; use our building blocks to personalise an inference engine for your needs. ***API vs In-house*** Renting AI via third-party APIs has apparent downsides: data security, rate limits, unreliable performance, and inflated cost. Every company has different inferencing needs: *One size does not fit all.* Businesses need control to manage their cost <> performance tradeoffs. Hence, the movement towards open-source usage: businesses prefer small niche models trained on relevant datasets over large generalist models that do not justify ROI. **Need for MLOps platform*** Deploying large models comes with its hurdles: access to compute, model optimisation, scaling infrastructure, CI/CD pipelines, and cost efficiency, all requiring highly skilled machine learning engineers. We need a tool to support this advent towards generative AI, as we had tools to transition to cloud and mobile. MLOps platforms simplify orchestration workflows for in-house deployment cycles. Two off-the-shelf solutions readily available: 1. Orchestration platforms with model serving layer: *do not offer optimised performance for all models, limiting user’s ability to squeeze performance* 2. GenAI Cloud Platforms: *GPU brokers offering no control over cost* Enterprises need control. Simplismart’s MLOps platform provides them with building blocks to prepare for the necessary inference. The fastest inference engine allows businesses to unlock and run each model at performant speed. The inference engine has been optimised at three levels: the model-serving layer, infrastructure layer, and a model-GPU-chip interaction layer, while also enhanced with a known model compilation technique.

Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco
Type
Privately Held
Founded
2022
Specialties
Machine learning, Artificial Intelligence, Deep Learning, Predictive analysis, Supervised Learning, Distributed learning, Workflow management, Auto-scale deployments, Audio Transcription, Summarization, Entity Extraction, Entity Classification, LLM, RAG, Diffusion Pipelines, and Voice

Products

Locations

Employees at Simplismart

Updates

  • We're excited to share that our CTO, Devansh Ghatak will be speaking at the Ray Summit on "Orchestrating the GenAI Lifecycle with KubeRay: Training, Inference and Benchmarking" In his talk, Devansh will walk through how KubeRay simplifies distributed model workflows by providing a unified, scalable layer for managing them on Kubernetes. We’ll explore real-world examples, including: - Distributed Fine-Tuning: Cut training time from 600 days to 60 hours using Ray DDP + Simplismart’s optimization layer. - Batch Inference at Scale: Transcribe a month of video for just $1 with Ray-orchestrated Whisper jobs. - Automated Benchmarking: Discover 2× cost-efficient configs through intelligent SLA-based tuning powered by Ray 📍  Ray Summit 2025, San Francisco 🗓️ 12:30 PM PT | November 4, 2025

    • No alternative text description for this image
  • 🚀 Simplismart is a Gold Sponsor at Ray Summit 2025! We’re turning our Booth G2 into a mini-lab: demos, puzzles, problem-solving, and a few surprises. Stop by to: • See Simplismart Copilot find the best inference setups in seconds. • Compete in the GenAI Performance Prediction Challenge, flex your config instincts and win cool prizes. • Drop your toughest inference issue at our Inference Clinic, we’ll brainstorm practical fixes. • Score exclusive swag (while supplies last). If you’re shipping GenAI to production (or trying to), come talk trade-offs, real metrics, and practical fixes. 📍 Marriott Marquis, San Francisco 🗓️ November 4-5, 2025 #Simplismart #RaySummit2025 #GenAI #GenAIOptimization

    • No alternative text description for this image
  • We’re excited to share that our CTO, Devansh Ghatak, will be speaking at ODSC AI West 2025! 🎤 Talk: Tailor-made Inference: Managing Trade-offs Across Performance & Cost Optimizing inference goes far beyond choosing the right engine. It’s about navigating real-world challenges like: • Latency spikes at the wrong time • Throughput issues that limit scaling • Cloud costs that keep growing. Devansh will break down what it takes to design an inference system built for your specific workload, balancing latency, throughput, and cost without compromise. He’ll share how modular architectures can help teams stay within SLAs while scaling efficiently across diverse production environments. See you there!

    • No alternative text description for this image
  • 🚀 Simplismart is heading to ODSC AI West 2025! We’ll be at Booth #23 come say Hi to the team and dive into how we make production AI actually scale. Here’s what you’ll find at our booth: • Inference Clinic: Bring your toughest deployment or inference challenges our engineers will help you brainstorm ways to optimize and scale your workloads • Simplismart Copilot: See how our Copilot helps you optimize your inference stack and boost model performance. • GenAI Performance Prediction Challenge: Put your intuition to the test and win some exciting prizes! 📍  Hyatt Regency, Burlingame, CA  🗓️ October 28-30, 2025 #Simplismart #ODSC2025 #ODSCWest #GenAI #DataScience

    • No alternative text description for this image
  • We’ve introduced the Simplismart Benchmarking Suite, a practical way to evaluate GenAI models in real-world conditions. It helps teams test GenAI models the way they actually run in the real world, not just for accuracy, but for speed, reliability, and consistency under load. If you’ve ever wondered which model really performs better in production, this is for you. Read the full post here 👇

  • What are the usual roadblocks you face when you're deploying an LLM in production? Our team at Simplismart wrote about how you'd approach an LLM deployment and how to overcome the challenges. In this complete guide on deploying GPT-OSS 120B on NVIDIA H100 GPUs using vLLM, we've covered: - Single GPU deployment with memory optimization - Multi-GPU setup using tensor parallelism for production workloads - Configuration strategies for high-throughput vs. low-latency scenarios The results? Multi-GPU deployment achieved ~200ms median TTFT (Time To First Token). Check out the full technical walkthrough!

  • View organization page for Simplismart

    17,529 followers

    We’re all set up at PyTorch Conference 2025! 🚀 Swing by Booth G9 to: • See Simplismart Co-pilot in action: Our tool that helps you find the best inference setup in seconds. • Play the GenAI Performance Prediction Challenge and win some awesome gifts! • Drop by our Inference Clinic to brainstorm your ongoing GenAI deployment or inference challenges with our team. Come say Hi, we’d love to chat about making inference faster, smarter, and tailor-made for your product. #PyTorch #Simplismart #GenAI #Inference #AIInfrastructure #PyTorchConference

    • No alternative text description for this image
    • No alternative text description for this image
  • We’re delighted to share that our CTO, Devansh Ghatak, will be speaking at the PyTorch Conference 2025 on “Optimizing Model Inference with PyTorch 2.0.” He’ll explore how to push PyTorch inference performance to the next level by combining dynamic compilation, CUDA graph capture, Quantization, AOT compilation, and custom fused operators to achieve low-latency, production-grade deployments. If you’re working on scaling GenAI workloads or optimizing inference pipelines, this is one session you won’t want to miss. 🗓️ October 22 | 4:05 PM PDT

    • No alternative text description for this image
  • Simplismart reposted this

    We’re excited to announce that Simplismart is a Gold Sponsor at PyTorch Conference 2025! The PyTorch community has been at the heart of our innovation journey powering how we build, optimize, and scale AI models in production. We’re proud to support the ecosystem that continues to shape the future of GenAI. 👉 Drop by our booth (#G9) to: • Experience the Inference Clinic and see Simplismart Copilot in action • Chat with our engineering team about real-world AI scaling challenges • Grab some exclusive swag 🎁 📍 Moscone West, San Francisco 🗓️ October 22-23, 2025

    • No alternative text description for this image
  • We’re excited to announce that Simplismart is a Gold Sponsor at PyTorch Conference 2025! The PyTorch community has been at the heart of our innovation journey powering how we build, optimize, and scale AI models in production. We’re proud to support the ecosystem that continues to shape the future of GenAI. 👉 Drop by our booth (#G9) to: • Experience the Inference Clinic and see Simplismart Copilot in action • Chat with our engineering team about real-world AI scaling challenges • Grab some exclusive swag 🎁 📍 Moscone West, San Francisco 🗓️ October 22-23, 2025

    • No alternative text description for this image

Similar pages

Browse jobs

Funding