Skip to content

hchiam/learning-llm-token-counter

Repository files navigation

Learning about LLM token counters

Just one of the things I'm learning. https://github.com/hchiam/learning

Could be used to tell the user ahead of time that there’s too many tokens in the input.

The demo of this repo lets you check for a few different LLMs.

Notes

For example: here’s OpenAI token counter that could be implemented in JS with js-tiktoken:

import { getEncoding, encodingForModel } from "js-tiktoken";
const tokenCount = getEncoding(modelName).encode(text).length;

Or maybe for other models, use @xenova/transformers:

import { AutoTokenizer } from "@xenova/transformers";
const tokenizer = await AutoTokenizer.from_pretrained(modelName);
const { input_ids } = await tokenizer(text);
const tokenCount = input_ids.size; // ?

Or maybe use llama-tokenizer-js for Meta LLama:

import llamaTokenizer from "llama-tokenizer-js";
const tokenCount = llamaTokenizer.encode(text).length;

Demos

To run repo's demo locally: you need yarn and vite so you can run cd demo; yarn dev; --> http://localhost:5173/

Or just go to this live demo: https://hchiam-llm-token-count.surge.sh/

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published