{"mistralai/Mistral-Nemo-Instruct-2407":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.02,"outputPrice":0.04,"description":"12B model trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size."},"meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.4,"outputPrice":0.4,"description":"Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes"},"Bria/fibo_edit":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"meta-llama/Llama-4-Scout-17B-16E-Instruct":{"maxTokens":327680,"contextWindow":327680,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.08,"outputPrice":0.3,"description":"The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts"},"Bria/erase":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"moonshotai/Kimi-K2-Instruct-0905":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.4,"outputPrice":2,"cacheReadsPrice":0.15000000000000002,"description":"Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k.  This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training."},"ByteDance/Seedream-4.5":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"google/gemini-2.5-pro":{"maxTokens":1000000,"contextWindow":1000000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":1.25,"outputPrice":10,"description":"Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities.  Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.  The Gemini 2.5 Pro model is now available on DeepInfra."},"sentence-transformers/multi-qa-mpnet-base-dot-v1":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-235B-A22B-Instruct-2507":{"maxTokens":262144,"contextWindow":262144,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.071,"outputPrice":0.1,"description":"Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.  "},"deepseek-ai/DeepSeek-V3.2":{"maxTokens":163840,"contextWindow":163840,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.26,"outputPrice":0.38,"cacheReadsPrice":0.13,"description":"DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments."},"black-forest-labs/FLUX-1-schnell":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Bria/Bria-3.2-vector":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Max":{"maxTokens":256000,"contextWindow":256000,"supportsImages":false,"supportsPromptCache":true,"inputPrice":1.2,"outputPrice":5.999999999999999,"cacheReadsPrice":0.24,"description":"The latest flagship model in the Qwen family. State-of-the-art results across a comprehensive suite of benchmarks — including knowledge, reasoning, coding, instruction following, human preference alignment, agent tasks, and multilingual understanding."},"black-forest-labs/FLUX-pro":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-14B":{"maxTokens":40960,"contextWindow":40960,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.12000000000000001,"outputPrice":0.24000000000000002,"description":"Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. "},"Bria/remove_background":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"intfloat/e5-base-v2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"thenlper/gte-base":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"black-forest-labs/FLUX-1-Redux-dev":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Gryphe/MythoMax-L2-13b":{"maxTokens":4096,"contextWindow":4096,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.4,"outputPrice":0.4,"description":""},"meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8":{"maxTokens":1048576,"contextWindow":1048576,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.15,"outputPrice":0.6,"description":"The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts"},"PrunaAI/p-image":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"shibing624/text2vec-base-chinese":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Bria/gen_fill":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"google/gemma-3-12b-it":{"maxTokens":131072,"contextWindow":131072,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.04,"outputPrice":0.13,"description":"Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2"},"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.02,"outputPrice":0.030000000000000002,"description":"Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes"},"meta-llama/Meta-Llama-3.1-8B-Instruct":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.02,"outputPrice":0.05,"description":"Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes"},"Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo":{"maxTokens":262144,"contextWindow":262144,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.21999999999999997,"outputPrice":1,"cacheReadsPrice":0.022,"description":"Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet."},"PaddlePaddle/PaddleOCR-VL-0.9B":{"maxTokens":16384,"contextWindow":16384,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.14,"outputPrice":0.8,"description":"PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios."},"intfloat/multilingual-e5-large":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Coder-480B-A35B-Instruct":{"maxTokens":262144,"contextWindow":262144,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.4,"outputPrice":1.6,"description":"Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet."},"deepseek-ai/DeepSeek-R1-Distill-Llama-70B":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.7,"outputPrice":0.8,"description":"DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks. "},"Qwen/Qwen3-VL-30B-A3B-Instruct":{"maxTokens":262144,"contextWindow":262144,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.15,"outputPrice":0.6,"description":"Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.  This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities."},"sentence-transformers/all-MiniLM-L12-v2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"black-forest-labs/FLUX.1-Kontext-dev":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Embedding-0.6B":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"openai/gpt-oss-20b":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.030000000000000002,"outputPrice":0.14,"description":"gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs."},"black-forest-labs/FLUX-1.1-pro":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"BAAI/bge-base-en-v1.5":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"deepseek-ai/DeepSeek-OCR":{"maxTokens":8192,"contextWindow":8192,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.030000000000000002,"outputPrice":0.1,"description":"DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs."},"nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL":{"maxTokens":131072,"contextWindow":131072,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.2,"outputPrice":0.6,"description":"The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities."},"black-forest-labs/FLUX-2-klein-4b":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Embedding-8B-batch":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Embedding-4B-batch":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"deepseek-ai/DeepSeek-R1-0528":{"maxTokens":163840,"contextWindow":163840,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.5,"outputPrice":2.15,"cacheReadsPrice":0.35,"description":"The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528."},"google/gemma-3-27b-it":{"maxTokens":131072,"contextWindow":131072,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.08,"outputPrice":0.16,"description":"Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2"},"deepseek-ai/Janus-Pro-1B":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"black-forest-labs/FLUX-2-max":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"sentence-transformers/clip-ViT-B-32-multilingual-v1":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"allenai/Olmo-3.1-32B-Instruct":{"maxTokens":65536,"contextWindow":65536,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.2,"outputPrice":0.6,"description":"Olmo is a series of Open language models, developed by Allen Institute for AI (Ai2), designed to enable the science of language models. "},"Qwen/Qwen3-Next-80B-A3B-Instruct":{"maxTokens":262144,"contextWindow":262144,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.09,"outputPrice":1.1,"description":"Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next."},"sentence-transformers/all-mpnet-base-v2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"meta-llama/Meta-Llama-3.1-70B-Instruct":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.4,"outputPrice":0.4,"description":"Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes"},"Qwen/Qwen2.5-VL-32B-Instruct":{"maxTokens":128000,"contextWindow":128000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.2,"outputPrice":0.6,"description":""},"Sao10K/L3.3-70B-Euryale-v2.3":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.85,"outputPrice":0.85,"description":"L3.3-70B-Euryale-v2.3 is a model focused on creative roleplay from Sao10k"},"Bria/fibo":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"meta-llama/Llama-3.2-11B-Vision-Instruct":{"maxTokens":131072,"contextWindow":131072,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.048999999999999995,"outputPrice":0.048999999999999995,"description":"Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.  Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research."},"ClarityAI/flux":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"black-forest-labs/FLUX-2-klein-9b":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"deepseek-ai/DeepSeek-V3.1-Terminus":{"maxTokens":163840,"contextWindow":163840,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.21,"outputPrice":0.7899999999999999,"cacheReadsPrice":0.1300000002,"description":"DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs  The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows."},"Qwen/Qwen3-Embedding-8B":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"black-forest-labs/FLUX-2-dev":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Bria/erase_foreground":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"moonshotai/Kimi-K2.5":{"maxTokens":262144,"contextWindow":262144,"supportsImages":true,"supportsPromptCache":true,"inputPrice":0.45,"outputPrice":2.25,"cacheReadsPrice":0.070000002,"description":"Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms."},"zai-org/GLM-4.6V":{"maxTokens":131072,"contextWindow":131072,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.3,"outputPrice":0.9,"description":"This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning."},"ClarityAI/creative":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"allenai/olmOCR-2-7B-1025":{"maxTokens":16384,"contextWindow":16384,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.09,"outputPrice":0.19,"description":"olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions."},"Bria/blur_background":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"BAAI/bge-m3-multi":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Bria/replace_background":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-30B-A3B":{"maxTokens":40960,"contextWindow":40960,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.08,"outputPrice":0.28,"description":"Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support"},"thenlper/gte-large":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-VL-235B-A22B-Instruct":{"maxTokens":262144,"contextWindow":262144,"supportsImages":true,"supportsPromptCache":true,"inputPrice":0.2,"outputPrice":0.8799999999999999,"cacheReadsPrice":0.11000000000000001,"description":"Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.  This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities."},"openai/gpt-oss-120b":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.039,"outputPrice":0.19,"description":"gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation."},"nvidia/Nemotron-3-Nano-30B-A3B":{"maxTokens":262144,"contextWindow":262144,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.05,"outputPrice":0.2,"description":"NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads."},"google/gemma-3-4b-it":{"maxTokens":131072,"contextWindow":131072,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.04,"outputPrice":0.08,"description":"Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2"},"mistralai/Mistral-Small-24B-Instruct-2501":{"maxTokens":32768,"contextWindow":32768,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.05,"outputPrice":0.08,"description":"Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.  The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware."},"mistralai/Mixtral-8x7B-Instruct-v0.1":{"maxTokens":32768,"contextWindow":32768,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.54,"outputPrice":0.54,"description":"Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks."},"zai-org/GLM-4.7":{"maxTokens":202752,"contextWindow":202752,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.4,"outputPrice":1.75,"cacheReadsPrice":0.08000000000000002,"description":"GLM-4.7 is a state-of-the-art, multilingual Mixture-of-Experts (MoE) language model designed for complex reasoning, agentic coding, and tool use. Building on its predecessor GLM-4.6, it delivers significant improvements across key benchmarks, including multilingual SWE-bench, Terminal Bench, and reasoning-heavy evaluations like HLE. The model features advanced \"Interleaved Thinking\" and new \"Preserved Thinking\" modes, allowing it to reason before actions and maintain consistency across long, multi-turn tasks. With 358 billion parameters, GLM-4.7 excels in generating clean code, modern UI elements, and sophisticated reasoning outputs."},"microsoft/phi-4":{"maxTokens":16384,"contextWindow":16384,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.07,"outputPrice":0.14,"description":"Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning."},"Qwen/Qwen3-32B":{"maxTokens":40960,"contextWindow":40960,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.08,"outputPrice":0.28,"description":"Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support"},"nvidia/NVIDIA-Nemotron-Nano-9B-v2":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.04,"outputPrice":0.16,"description":"NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response.  The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so."},"Sao10K/L3-8B-Lunaris-v1-Turbo":{"maxTokens":8192,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.04,"outputPrice":0.05,"description":""},"sentence-transformers/all-MiniLM-L6-v2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"zai-org/GLM-4.6":{"maxTokens":202752,"contextWindow":202752,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.43,"outputPrice":1.74,"cacheReadsPrice":0.0799999993,"description":"Compared with GLM-4.5, GLM-4.6 brings several key improvements:  Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios."},"Qwen/Qwen-Image-Edit":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Max-Thinking":{"maxTokens":256000,"contextWindow":256000,"supportsImages":false,"supportsPromptCache":true,"inputPrice":1.2,"outputPrice":5.999999999999999,"cacheReadsPrice":0.24,"description":"The latest flagship reasoning model in the Qwen3 family. Further enhanced by multiple innovations like adaptive tool-use and advanced test-time scaling techniques"},"Qwen/Qwen2.5-72B-Instruct":{"maxTokens":32768,"contextWindow":32768,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.12000000000000001,"outputPrice":0.38999999999999996,"description":"Qwen2.5 is a model pretrained on a large-scale dataset of up to 18 trillion tokens, offering significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor Qwen2. The model also features enhanced capabilities in generating long texts, understanding structured data, and generating structured outputs, while supporting multilingual capabilities for over 29 languages."},"anthropic/claude-4-opus":{"maxTokens":200000,"contextWindow":200000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":16.5,"outputPrice":82.5,"description":"Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features."},"Bria/expand":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"google/gemini-1.5-flash":{"maxTokens":1000000,"contextWindow":1000000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.075,"outputPrice":0.3,"description":"Gemini 1.5 Flash is Google's foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.  Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. "},"anthropic/claude-4-sonnet":{"maxTokens":200000,"contextWindow":200000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":3.3000000000000003,"outputPrice":16.5,"description":"Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more."},"PrunaAI/p-image-Edit":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"deepseek-ai/DeepSeek-V3":{"maxTokens":163840,"contextWindow":163840,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.32,"outputPrice":0.8899999999999999,"description":"DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. "},"nvidia/Llama-3.3-Nemotron-Super-49B-v1.5":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.1,"outputPrice":0.4,"description":"Llama-3.3-Nemotron-Super-49B-v1.5 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. "},"google/gemini-2.5-flash":{"maxTokens":1000000,"contextWindow":1000000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.3,"outputPrice":2.5,"description":"Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.  Gemini 2.5 Flash: best for balancing reasoning and speed."},"Qwen/Qwen3-Embedding-0.6B-batch":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"NousResearch/Hermes-3-Llama-3.1-405B":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":1,"outputPrice":1,"description":"Hermes 3 is a cutting-edge language model that offers advanced capabilities in roleplaying, reasoning, and conversation. It's a fine-tuned version of the Llama-3.1 405B foundation model, designed to align with user needs and provide powerful control. Key features include reliable function calling, structured output, generalist assistant capabilities, and improved code generation. Hermes 3 is competitive with Llama-3.1 Instruct models, with its own strengths and weaknesses."},"deepseek-ai/Janus-Pro-7B":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"anthropic/claude-3-7-sonnet-latest":{"maxTokens":200000,"contextWindow":200000,"supportsImages":true,"supportsPromptCache":true,"inputPrice":3.3000000000000003,"outputPrice":16.5,"cacheReadsPrice":0.33000000000000007,"description":""},"meta-llama/Llama-3.2-3B-Instruct":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.02,"outputPrice":0.02,"description":"The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)"},"moonshotai/Kimi-K2-Thinking":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.47,"outputPrice":2,"cacheReadsPrice":0.141,"description":"Kimi K2 Thinking is the latest, most capable version of open-source thinking model developed by MoonshotAI"},"Qwen/Qwen3-235B-A22B-Thinking-2507":{"maxTokens":262144,"contextWindow":262144,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.22999999999999998,"outputPrice":2.3,"cacheReadsPrice":0.20000000059999998,"description":"Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning. "},"black-forest-labs/FLUX-1-dev":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"mistralai/Mistral-Small-3.2-24B-Instruct-2506":{"maxTokens":128000,"contextWindow":128000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.075,"outputPrice":0.2,"description":"Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks."},"zai-org/GLM-4.7-Flash":{"maxTokens":202752,"contextWindow":202752,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.060000000000000005,"outputPrice":0.4,"cacheReadsPrice":0.0100000002,"description":"GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency."},"meta-llama/Meta-Llama-3-8B-Instruct":{"maxTokens":8192,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.030000000000000002,"outputPrice":0.04,"description":"Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes."},"nvidia/Llama-3.1-Nemotron-70B-Instruct":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":1.2,"outputPrice":1.2,"description":"Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo.  As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet."},"BAAI/bge-large-en-v1.5":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"meta-llama/Llama-3.3-70B-Instruct-Turbo":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.1,"outputPrice":0.32,"description":"Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation."},"google/gemini-1.5-flash-8b":{"maxTokens":1000000,"contextWindow":1000000,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.0375,"outputPrice":0.15,"description":""},"deepseek-ai/DeepSeek-R1-0528-Turbo":{"maxTokens":32768,"contextWindow":32768,"supportsImages":false,"supportsPromptCache":false,"inputPrice":1,"outputPrice":2.9999999999999996,"description":"The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses"},"stabilityai/sdxl-turbo":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"ByteDance/Seedream-4":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"zai-org/GLM-5":{"maxTokens":202752,"contextWindow":202752,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.8,"outputPrice":2.56,"cacheReadsPrice":0.16000000000000003,"description":"GLM-5 is an advanced, open-source large language model designed for developers tackling the toughest challenges. It excels at long-context reasoning, multi-step tool orchestration, and complex systems engineering, making it the ideal choice for powering sophisticated agents and applications that require high-level cognitive tasks."},"sentence-transformers/paraphrase-MiniLM-L6-v2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"intfloat/e5-large-v2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"ClarityAI/crystal":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"openai/gpt-oss-120b-Turbo":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.15,"outputPrice":0.6,"description":""},"BAAI/bge-m3":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"MiniMaxAI/MiniMax-M2.1":{"maxTokens":196608,"contextWindow":196608,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.27,"outputPrice":0.95,"cacheReadsPrice":0.029999999700000002,"description":"MiniMax-M2.1 is a model optimized specifically for robustness in coding, tool use, instruction following, and long-horizon planning. From automating multilingual software development to executing complex, multi-step office workflows, MiniMax-M2.1 empowers developers to build the next generation of autonomous applications—all while being fully transparent, controllable, and accessible."},"deepseek-ai/DeepSeek-V3-0324":{"maxTokens":163840,"contextWindow":163840,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.2,"outputPrice":0.77,"cacheReadsPrice":0.135,"description":"DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3."},"intfloat/multilingual-e5-large-instruct":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Bria/Bria-3.2":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"black-forest-labs/FLUX-2-pro":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Bria/enhance":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"NousResearch/Hermes-3-Llama-3.1-70B":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.3,"outputPrice":0.3,"description":"Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board."},"meta-llama/Llama-Guard-4-12B":{"maxTokens":163840,"contextWindow":163840,"supportsImages":true,"supportsPromptCache":false,"inputPrice":0.18,"outputPrice":0.18,"description":"Llama Guard 4 is a natively multimodal safety classifier with 12 billion parameters trained jointly on text and multiple images. Llama Guard 4 is a dense architecture pruned from the Llama 4 Scout pre-trained model and fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It itself acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated."},"deepseek-ai/DeepSeek-V3.1":{"maxTokens":163840,"contextWindow":163840,"supportsImages":false,"supportsPromptCache":true,"inputPrice":0.21,"outputPrice":0.7899999999999999,"cacheReadsPrice":0.1300000002,"description":"DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats."},"Sao10K/L3.1-70B-Euryale-v2.2":{"maxTokens":131072,"contextWindow":131072,"supportsImages":false,"supportsPromptCache":false,"inputPrice":0.85,"outputPrice":0.85,"description":"Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k"},"BAAI/bge-en-icl":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"Qwen/Qwen3-Embedding-4B":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"google/embeddinggemma-300m":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false},"sentence-transformers/clip-ViT-B-32":{"maxTokens":1639,"contextWindow":8192,"supportsImages":false,"supportsPromptCache":false}}