Shunya AI

Shunya AI Model Card

Shunya Mini 1.0

Company: Shunya AI
Release date: August 2025
Model type: Agentic LLM built with an ML model and expert-model architecture, featuring an integrated router.
Target market: India (consumer and enterprise)

1. Summary

What it is

Shunya Mini 1.0 is an agentic language model designed for Indian users and organizations. It uses a dense model along with ML models and specialized variants of that model for domain specific tasks (specialized experts for agents and tool calling, reasoning, code generation, retrieval‑augmented QA, structured output, etc) plus an integrated router that dynamically selects best experts for each user request. This design aims to deliver fast responses on simple tasks and deeper, deliberative reasoning on harder problems—without disclosing parameter count.

Why it matters for India

The model prioritizes Indic‑language support (Eighth Schedule and widely used non‑scheduled varieties), code‑switched Indian English (e.g., Hinglish), and mixed‑script input. It is tuned for common Indian use cases—public services, BFSI, retail, SMB enablement—and engineered to interoperate with India's digital public infrastructure (e.g., India Stack components such as DigiLocker, UPI rails via partners, Account Aggregator consent flows, and ONDC‑style schemas), while conforming to the Digital Personal Data Protection (DPDP) Act, 2023 and MeitY's due‑diligence advisories.

What's new

Compared with conventional single‑model stacks, Shunya Mini's router:

  • (i) infers task type (retrieval, code, multilingual dialog, structured extraction)
  • (ii) allocates a “thinking” budget when needed (sampling/verification depth)
  • (iii) chooses among internal tools (retrieval, function calls) and specialized experts for the plan‑execute‑reflect loop

2. Model family and release

Current release: Shunya Mini 1.0

Access: API from the Indian datacenters with zero data retention policy, close source model.

Cadence: Safety/capability updates ship as minor point releases (e.g., 1.0.x) with changelogs.

3. Architecture & agentic routing

3.1 Model Architecture

Core Model Design

Shunya Mini 1.0 is built on a dense Transformer architecture with the following specifications:

  • Dense Transformers — optimized for consistent accuracy across domains.
  • SiLU activation for smoother gradient flow and improved convergence.
  • 80 Transformer layers for deep contextual reasoning.
  • ~155K vocabulary size covering code, Indian languages, and global languages.
  • Context window: Up to 128K tokens (with 32K tokens as the optimal operational range for best performance).
  • Training sequence length: Pretraining on 4K and 8K sequences, extended to 128K during post-training via RoPE extrapolation techniques.

Training Scale

  • Training corpus size: 18.5 trillion tokens total.
  • 4 trillion tokens — Code from 90+ programming languages.
  • 3+ trillion tokens — 9+ Indian languages across multiple scripts and dialects.
  • 11+ trillion tokens — English and other major global languages.
  • Model size: Up to 40 billion parameters.
  • Distilled versions: Optimized lightweight variants of larger models for low-resource deployment while retaining reasoning capability.

3.2 Datasets

Pretraining Datasets

  • Primary Source: Public, high-quality datasets collected from open repositories.
  • Filtering: Multi-stage filtering pipeline to retain only the highest-quality data.
  • Sensitive Data Removal: Implemented advanced PII detection and removal to ensure safety and compliance before training.

Domain-Specific Collections

  • Code: Curated datasets from multiple organizations and open-source projects across 90+ programming languages.
  • Mathematics: Specialized datasets with step-by-step solutions for numeric reasoning.
  • Reasoning: Logic, problem-solving, and multi-step reasoning datasets from diverse domains.

Synthetic Data Generation

Generated large volumes of synthetic datasets using primitive approaches tuned for:

  • Indian language fluency and dialect diversity.
  • Complex reasoning workflows across multiple domains.

3.3 Post-Training

Instruction-Tuning

Curated highly complex, multi-turn, and high-quality datasets covering a wide range of tasks.

Over 50+ million samples focusing on:

  • Conversational flow.
  • Indian language style, dialect, and vocabulary adaptation.
  • Agentic workflows and multi-tool calling.
  • Advanced reasoning and decision-making.

Reinforcement Learning

  • Stage 1 — GRPO (Guided Reinforcement Preference Optimization):
    • Combined rule-based reward functions and LLM-based reward models.
    • Categories included mathematics, code generation, reasoning, and agentic workflows.
  • Stage 2 — DPO (Direct Preference Optimization):
    • Trained on millions of preference pairs to better align with user conversational styles and cultural expectations.

3.4 Expert-Model Core (Dense Transformer with Integrated Router)

Shunya Mini 1.0 is an agentic language model designed specifically for Indian users and organizations. It uses a dense Transformer architecture combined with auxiliary ML models and specialized variants of the base model for domain-specific tasks—such as agent and tool calling, reasoning, code generation, retrieval-augmented QA, and structured output generation.

An integrated routing system dynamically selects the most relevant specialized variant (expert) for each request. This routing blends:

  • Request-level pre-routing to set defaults for tools, safety posture, and reasoning depth before generation.
  • Token- or span-level adaptation for fine-grained control over formatting, numeracy, code handling, and multilingual script diversity.

Expert specializations (illustrative):

  • Multilingual/Indic variant tuned for script diversity (Hindi, Hinglish, Bengali–Assamese, Gujarati, Odia, Telugu, Kannada, Malayalam, Tamil, etc.) and code-switched Indian English.
  • Retrieval/RAG variant optimized for grounding, citation formatting, and hallucination reduction.
  • Code/SQL variant with constrained decoding utilities and static-analysis hints.
  • Structure/formatting variant for JSON schema generation.

3.5 Router design

  • Inputs. The router consumes the user query, system/policy state, domain specific experts GenAI and ML models, org context (rag, database, files), task and conversation context, available tools and resources.
  • Decision Process: Based on the conversation and the task at hand, the router determines the optimal execution path to complete the request. It dynamically selects the most suitable expert model or tool for each subtask.
  • If the task is complex and spans multiple domains, the router can activate multiple experts in parallel or sequence, enabling them to collaboratively solve the problem. This allows for both specialized precision and cross-domain reasoning, ensuring that each component of the solution is handled by the most capable resource.

3.6 “Thinking” Budget & Depth Control

In the thinking model, the system can define a budget and allocate computational resources for reasoning based on task complexity. By default, Shunya Mini X1 supports three reasoning depth profiles:

  • Light — Minimal deliberation for straightforward tasks, prioritizes speed and low resource usage.
  • Medium — Balanced reasoning for moderately complex queries; blends efficiency with accuracy.
  • Complex — Deep, multi-step reasoning for high-stakes or multi-domain problems; allocates maximum resources and extended context processing.

The router automatically selects the appropriate depth profile based on task complexity, user priority, and policy constraints, but it can also be overridden by explicit user or system instructions.

3.7 Agent Framework

The Shunya Mini 1.0 agent framework orchestrates sequential workflows with dynamic planning, agent/tool selection, and iterative evaluation for subsequent steps.

  • The framework and model can coordinate millions of agents within a single request without requiring custom rules or hand-crafted workflows for each scenario.
  • This design enables handling of diverse task types and tool integrations by default, leveraging the reasoning capabilities of the underlying models.
  • Agents can dynamically collaborate, share intermediate outputs, and adapt their plans mid-execution to achieve optimal results.

3.8 Performance & Efficiency

Shunya Mini 1.0 is optimized for fast, accurate responses while keeping compute usage minimal.

  • KV-Chaining: Reuses key–value states across multi-step reasoning and iterative calls, reducing recomputation and latency while preserving context fidelity.
  • Speculative Decoding: Produces preliminary outputs using lightweight speculative passes, verified within the same SLM to accelerate generation without compromising accuracy.
  • SLM-Only Integration Flow: All tasks are handled within a Small Language Model (SLM) framework, with optimized routing and internal specialization to maintain accuracy while minimizing cost and latency—no larger models are required.

4. Intended uses

4.1 Enterprise scenarios (India‑specific)

BFSI (banks, NBFCs, insurers)

Multilingual customer‑support copilots; complaint triage to RBI Ombudsman categories; circular/policy summarization; KYC document parsing with redaction (no storage of full Aadhaar/PAN unless explicitly consented and masked); Account Aggregator consent‑flow explainers; MIS query drafting (text‑to‑SQL); fraud pattern FAQs with safe guidance (no investigative claims).

Telecom

Plan recommendation and bill explanations in regional languages; outage FAQ generation; ticket triage and summarization for field ops; DoT/TRAI‑aligned phrasing templates for public responses.

Retail / e‑commerce / ONDC participants

ONDC‑style catalog mapping, product attribute normalization, GST invoice extraction to JSON, bilingual product copy, returns‑policy Q&A; store‑ops copilots for inventory/status lookups via function calls.

Government & PSUs

Citizen helpdesks for schemes (e.g., benefits eligibility summaries with citations); multilingual form guidance; file‑note drafts and docket summaries in English + local language; DigiLocker retrieval connectors (read‑only, scoped) for user‑authorized documents.

Healthcare (non‑diagnostic)

Appointment routing, discharge‑summary simplification, insurance claim correspondence drafting, consented report explanation in local languages. No clinical diagnosis or treatment recommendations.

Manufacturing & logistics

SOP assistants (Hindi/Marathi/Tamil etc.), safety checklist generators, shipment/route ETA query bots; structured extraction from PoD/invoices.

HR, policy, and legal ops

Multilingual policy explainers, leave/attendance Q&A, contract clause highlights (non‑advisory), code‑of‑conduct training content, and templated compliance summaries.

Safeguards fit: domain‑restricted RAG, schema‑validated function calls, per‑tenant logging/retention controls, and automatic masking for Indian IDs by default.

4.2 Developers & startups

  • Build patterns. Indic‑language chat and voicebots; form‑filling/validation; workflow automations via function calling (webhooks, CRM/ERP APIs); spreadsheet/data transformations; text‑to‑SQL for internal analytics.
  • Dev ergonomics. JSON‑Schema contracts for tool calls; streaming tokens; structured JSON output; tracing for router/tool decisions; seed prompts and eval checklists for Hindi‑English code‑switch.
  • Testing & safety. Local eval harnesses (reasoning, multilingual, safety prompts), red‑team packs tuned to Indian misuse patterns, and environment configs for dry‑run vs. production.
  • Deployment options. Public API and managed/VPC deployments (regional processing subject to contract); no local weight access; connector registry for DPI/enterprise tools.

4.3 Public‑facing experiences

  • Citizen services. State/UT portal copilots that guide users through forms in local languages, summarize eligibility criteria from official circulars, and generate checklists; kiosk‑friendly short prompts/responses.
  • Commerce & support. Multilingual chat for order status, returns, and service appointments; code‑switch‑tolerant voice IVRs (with ASR/TTS connectors); accessibility‑aware phrasing.
  • Education & skilling. Syllabus/notice summarization, bilingual study aids, and glossary building for regional terms (non‑certifying, informational only).

5. Safety approach

Shunya Mini 1.0 is designed for consistent, transparent, and safe operation in production environments.

Post-Training Safety Alignment

Additional fine-tuning is performed on a wide range of safety-critical datasets and scenario conditions to strengthen adherence to safety guidelines.

Multilingual Safety Training

Safety alignment covers multiple Indian languages and code-switched text to mitigate prompt injection and other adversarial attempts in a multilingual context.

ML-Based Harm Detection

Machine learning layers monitor both incoming user queries and outgoing model responses to detect and filter harmful or disallowed content before it reaches the end user.

6. India‑specific compliance & privacy

Data protection. Engineered to support compliance with the DPDP Act, 2023 via DPA terms and technical controls: consent capture, minimization, deletion workflows, and child‑data safeguards. Regional processing options and enterprise toggles (e.g., disable training on customer prompts) are available by contract.

Operational due diligence. Deployment playbooks align with MeitY advisories under the IT Rules. Notably, the 15 March 2024 revised advisory superseded the 1 March 2024 version and removed mandatory prior government approval while expanding due‑diligence expectations for AI intermediaries.

Interoperability with digital public infrastructure. Connectors and schema mappings are designed to interoperate with India Stack components (e.g., DigiLocker for document retrieval, ONDC‑style schemas for commerce workflows, and Account Aggregator consent rails in BFSI) subject to customer authorization and partner agreements.

7. Evaluation methodology

Shunya Mini 1.0 is evaluated using a comprehensive mix of public benchmarks and custom test suites to ensure strong performance across reasoning, multilingual understanding, and agentic workflows.

General Reasoning & Knowledge

Public benchmarks such as MMLU and GSM8K for multiple-choice reasoning and step-by-step problem solving.

Multilingual/Indic

Reading comprehension, translation, and retrieval on Indic corpora, robustness testing across Eighth Schedule languages, code-switched inputs, and mixed-script queries. Evaluation also uses custom Indian benchmarks to measure translation quality and Indian language writing proficiency.

Agentic Behavior

Tool-use accuracy, plan–execute–reflect quality, rollback frequency, and confirmation rates for sensitive actions. Includes custom benchmarks tailored for agentic workflows and enterprise use cases.

8. Known limitations & residual risks

Hallucinations & Overconfidence

While significantly reduced, the model may still produce inaccurate or fabricated information—especially for long-tail facts in low-resource Indic languages.

Bias & Fairness

The model can reflect societal biases, including those related to gender, caste, regional stereotypes, and under-represented dialects.

Safety Trade-offs

Output-focused safety measures may struggle with highly dual-use or adversarial queries. Layered policies and escalation mechanisms reduce, but do not fully eliminate, these risks.

References

  1. Vaswani, A. et al. (2017). Attention Is All You Need. https://arxiv.org/abs/1706.03762
  2. Elfwing, S., Uchibe, E., Doya, K. (2017). Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. https://arxiv.org/abs/1702.03118
  3. Su, J. et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. https://arxiv.org/pdf/2104.09864
  4. Hoffmann, J. et al. (2022). Training Compute-Optimal Large Language Models (Chinchilla). https://arxiv.org/abs/2203.15556
  5. Sardana, N. et al. (2024). Accounting for Inference in Large Language Model Scaling Laws. https://arxiv.org/abs/2408.03314
  6. Kwon, W. et al. (2023). Efficient Memory Management for LLM Serving with PagedAttention (vLLM). https://arxiv.org/abs/2309.06180
  7. Dao, T. et al. (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. https://arxiv.org/abs/2205.14135
  8. Shah, J. et al. (2024). FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision. https://arxiv.org/abs/2407.08608
  9. Zhang, J. et al. (2023/2024). Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding. https://arxiv.org/abs/2309.08168
  10. Miao, X. et al. (2023/2024). SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Verification. https://arxiv.org/abs/2305.09781
  11. Ouyang, L. et al. (2022). Training Language Models to Follow Instructions with Human Feedback (InstructGPT). https://arxiv.org/abs/2203.02155
  12. Rafailov, R. et al. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. https://arxiv.org/abs/2305.18290
  13. Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. https://arxiv.org/abs/2210.03629
  14. Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. https://arxiv.org/abs/2303.11366
  15. Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. https://arxiv.org/abs/2302.04761
  16. Hendrycks, D. et al. (2020). Measuring Massive Multitask Language Understanding (MMLU). https://arxiv.org/abs/2009.03300
  17. Cobbe, K. et al. (2021). Training Verifiers to Solve Math Word Problems (GSM8K). https://arxiv.org/abs/2110.14168
  18. Government of India (2023). Digital Personal Data Protection Act, 2023 (Act No. 22 of 2023). https://www.meity.gov.in/static/uploads/2024/06/2bf1f0e9f04e6fb4f8fef35e82c42aa5.pdf