Show HN: Torque – A declarative, typesafe DSL for LLM training datasets (MIT)

github.com

3 points by michalwarda 14 hours ago

We were frustrated with dataset generation DX: ad‑hoc scripts and JSON templates break as flows branch, tool‑calls drift, and reproducibility is hard.

So we built Torque: a schema‑first, declarative, fully typesafe DSL to generate conversational datasets.

What it is: Declarative DSL - Compose conversations like React components Fully Typesafe - Zod schemas with complete type inference Provider Agnostic - Generate with any AI SDK provider (OpenAI, Anthropic, DeepSeek, vLLM, LLaMA.cpp etc.) AI-Powered Content - Generate realistic varied datasets automatically without complicated scripts Faker Integration - Built-in Faker.js with automatic seed synchronization for reproducible fake data Cache Optimized - Reuses context across generations to reduce costs Prompt Optimized - Concise, optimized structures, prompts and generation workflow lets you use smaller, cheaper models Quick example Concurrent Generation - Beautiful async CLI with real-time progress tracking while generating concurrently

import { generateDataset, generatedUser, generatedAssistant, oneOf, assistant } from “@qforge/torque”; import { openai } from “@ai-sdk/openai”;

await generateDataset( () => [ generatedUser({ prompt: “Friendly greeting” }), oneOf([assistant({ content: “Hello!” }), generatedAssistant({ prompt: “Respond to greeting” })]), ], { count: 2, model: openai(“gpt-5-mini”), seed: 42 } );

Links • GitHub: https://github.com/qforge-dev/torque • Try in browser: https://stackblitz.com/github/qforge-dev/torque/tree/main/st... • npm: https://www.npmjs.com/package/@qforge/torque

What feedback would help most: • What dataset would you like us to create / recreate? • Do you like the API? Any suggestions on how to change it?

License: MIT.

Happy to answer questions in the thread!

dawidkielbasa 14 hours ago

Looks promising. I will try it out for sure