
Research
2025 Report: Destructive Malware in Open Source Packages
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.
sdialog
Advanced tools
SDialog is an MIT-licensed open-source toolkit for building, simulating, and evaluating LLM-based conversational agents end-to-end. It aims to bridge agent construction → dialog generation → evaluation → (optionally) interpretability in a single reproducible workflow, so you can generate reliable, controllable dialog systems or data at scale.
It standardizes a Dialog schema and offers persona‑driven multi‑agent simulation with LLMs, composable orchestration, built‑in metrics, and mechanistic interpretability.
Quick links: Website • GitHub • Docs • API • Demo (Colab) • Tutorials • Datasets (HF) • Issues
If you are building conversational systems, benchmarking dialog models, producing synthetic training corpora, simulating diverse users to test or probe conversational systems, or analyzing internal model behavior, SDialog provides an end‑to‑end workflow.
pip install sdialog
Alternatively, a ready-to-use Apptainer image (.sif) with SDialog and all dependencies is available on Hugging Face and can be downloaded here.
apptainer exec --nv sdialog.sif python3 -c "import sdialog; print(sdialog.__version__)"
[!NOTE] This Apptainer image also has the Ollama server preinstalled.
Here's a short, hands‑on example: a support agent helps a customer disputing a double charge. We add a small refund rule and two simple tools, generate three dialogs for evaluation, then serve the agent on port 1333 for Open WebUI or any OpenAI‑compatible client.
import sdialog
from sdialog import Context
from sdialog.agents import Agent
from sdialog.personas import SupportAgent, Customer
from sdialog.orchestrators import SimpleReflexOrchestrator
# First, let's set our preferred default backend:model and parameters
sdialog.config.llm("openai:gpt-4.1", temperature=1, api_key="YOUR_KEY") # or export OPENAI_API_KEY=YOUR_KEY
# sdialog.config.llm("ollama:qwen3:14b") # etc.
# Let's define our personas (use built-ins like in this example, or create your own!)
support_persona = SupportAgent(name="Ava", politeness="high", communication_style="friendly")
customer_persona = Customer(name="Riley", issue="double charge", desired_outcome="refund")
# (Optional) Let's define two mock tools (just plain Python functions) for our support agent
def account_verification(user_id):
"""Verify user account by user id."""
return {"user_id": user_id, "verified": True}
def refund(amount):
"""Process a refund for the given amount."""
return {"status": "refunded", "amount": amount}
# (Optional) Let's also include a small rule-based orchestrator for our support agent
react_refund = SimpleReflexOrchestrator(
condition=lambda utt: "refund" in utt.lower(),
instruction="Follow refund policy; verify account, apologize, refund.",
)
# Now, let's create the agents!
support_agent = Agent(
persona=support_persona,
think=True, # Let's also enable thinking mode
tools=[account_verification, refund],
name="Support"
)
simulated_customer = Agent(
persona=customer_persona,
first_utterance="Hi!",
name="Customer"
)
# Since we have one orchestrator, let's attach it to our target agent
support_agent = support_agent | react_refund
# Let's generate 3 dialogs between them! (we can evaluate them later)
# (Optional) Let's also define a concrete conversational context for the agents in these dialogs
web_chat = Context(location="chat", environment="web", circumstances="billing")
for ix in range(3):
dialog = simulated_customer.dialog_with(support_agent, context=web_chat) # Generate the dialog
dialog.to_file(f"dialog_{ix}.json") # Save it
dialog.print(all=True) # And pretty print it with all its events (thoughts, orchestration, etc.)
# Finally, let's serve our support agent to interact with real users (OpenAI-compatible API)
# Point Open WebUI or any OpenAI-compatible client to: http://localhost:1333
support_agent.serve(port=1333)
[!TIP]
- Choose your LLMs and backends freely.
- Personas and context can be automatically generated (e.g. generate different customer profiles!).
[!NOTE]
- See "agents with tools and thoughts" tutorial for a more complete example.
- See Serving Agents via REST API for more details on server options.
You can also use SDialog as a controllable test harness for any OpenAI‑compatible system such as vLLM-based ones by role‑playing realistic or adversarial users against your deployed system:
Core idea: wrap your system as an Agent using openai: as the prefix of your model name string, talk to it with simulated user Agents, and capture Dialogs you can save, diff, and score.
Below is a minimal example where our simulated customer interacts once with your hypothetical remote endpoint:
# Our remote system (your conversational backend exposing an OpenAI-compatible API)
system = Agent(
model="openai:your/model", # Model name exposed by your server
openai_api_base="http://your-endpoint.com:8000/v1", # Base URL of the service
openai_api_key="EMPTY", # Or a real key if required
name="System"
)
# Let's make our simulated customer talk with the system
dialog = simulated_customer.dialog_with(system)
dialog.to_file("dialog_0.json")
Dialogs are rich objects with helper methods (filter, slice, transform, etc.) that can be easily exported and loaded using different methods:
from sdialog import Dialog
# Load from JSON (generated by SDialog using `to_file()`)
dialog = Dialog.from_file("dialog_0.json")
# Load from HuggingFace Hub datasets
dialogs = Dialog.from_huggingface("sdialog/Primock-57")
# Create from plain text files or strings - perfect for converting existing datasets!
dialog_from_txt = Dialog.from_str("""
Alice: Hello there! How are you today?
Bob: I'm doing great, thanks for asking.
Alice: That's wonderful to hear!
""")
# Or, equivalently if the content is in a txt file
dialog_from_txt = Dialog.from_file("conversation.txt")
# Load from CSV files with custom column names
dialog_from_csv = Dialog.from_file("conversation.csv",
csv_speaker_col="speaker",
csv_text_col="value",)
# All Dialog objects have rich manipulation methods
dialog.filter("Alice").rename_speaker("Alice", "Customer").upper().to_file("processed.json")
avg_words_turn = sum(len(turn) for turn in dialog) / len(dialog)
See Dialog section in the documentation for more information.
Dialogs can be evaluated using the different components available inside the sdialog.evaluation module.
Use built‑in metrics (readability, flow, linguistic features, LLM judges) or easily create new ones, then aggregate and compare datasets (sets of dialogs) via DatasetComparator.
from sdialog.evaluation import LLMJudgeRealDialog, LinguisticFeatureScore
from sdialog.evaluation import FrequencyEvaluator, MeanEvaluator
from sdialog.evaluation import DatasetComparator
reference = [...] # list[Dialog]
candidate = [...] # list[Dialog]
judge = LLMJudgeRealDialog()
flesch = LinguisticFeatureScore(feature="flesch-reading-ease")
comparator = DatasetComparator([
FrequencyEvaluator(judge, name="Realistic dialog rate"),
MeanEvaluator(flesch, name="Mean Flesch Reading Ease"),
])
results = comparator({"reference": reference, "candidate": candidate})
# Plot results for each evaluator
comparator.plot()
[!TIP] See evaluation tutorial.
Attach Inspectors to capture per‑token activations and optionally steer (add/ablate directions) to analyze or intervene in model behavior.
import sdialog
from sdialog.interpretability import Inspector
from sdialog.agents import Agent
sdialog.config.llm("huggingface:meta-llama/Llama-3.2-3B-Instruct")
agent = Agent(name="Bob")
inspector = Inspector(target="model.layers.15")
agent = agent | inspector
agent("How are you?")
agent("Cool!")
# Let's get the last response's first token activation vector!
act = inspector[-1][0].act # [response index][token index]
Steering intervention (subtracting a direction):
import torch
anger_direction = torch.load("anger_direction.pt") # A direction vector (e.g., PCA / difference-in-mean vector)
agent_steered = agent | inspector - anger_direction # Ablate the anger direction from the target activations
agent_steered("You are an extremely upset assistant") # Agent "can't get angry anymore" :)
[!TIP] See the tutorial on using SDialog to remove the refusal capability from LLaMA 3.2.
SDialog can transform text dialogs into audio conversations with a simple one-line command. The audio module supports:
Generate audio from any dialog easily with just a few lines of code:
Install dependencies (see the documentation for complete setup instructions):
apt-get install sox ffmpeg espeak-ng
pip install sdialog[audio]
Then, simply:
from sdialog import Dialog
dialog = Dialog.from_file("my_dialog.json")
# Convert to audio with default settings (HuggingFace TTS - single speaker)
audio_dialog = dialog.to_audio(perform_room_acoustics=True)
print(audio_dialog.display())
# Or customize the audio generation
audio_dialog = dialog.to_audio(
perform_room_acoustics=True,
audio_file_format="mp3",
re_sampling_rate=16000,
)
print(audio_dialog.display())
[!TIP] See the Audio Generation documentation for more details. For usage examples including acoustic simulation, room generation, and voice databases, check out the audio tutorials.
https://sdialog.readthedocs.io/en/latest/llm.txt following the llm.txt specification. In your Copilot chat, simply use:
#fetch https://sdialog.readthedocs.io/en/latest/llm.txt
Your prompt goes here...(e.g. Write a python script using sdialog to have an agent for
criminal investigation, define its persona, tools, orchestration...)
To accelerate open, rigorous, and reproducible conversational AI research, SDialog invites the community to collaborate and help shape the future of open dialog generation.
[!NOTE] Example: Check out Primock-57, a sample dataset already available in SDialog format on Hugging Face.
If you have a dialog dataset you'd like to convert to SDialog format, need help with the conversion process, or want to contribute in any other way, please open an issue or reach out to us. We're happy to help and collaborate!
See CONTRIBUTING.md. We welcome issues, feature requests, and pull requests. If you want to contribute to the project, please open an issue or submit a PR, and help us make SDialog better 👍. If you find SDialog useful, please consider starring ⭐ the GitHub repository to support the project and increase its visibility 😄.
This project follows the all-contributors specification. All-contributors list:
Sergio Burdisso 💻 🤔 📖 ✅ | Labrak Yanis 💻 🤔 | Séverin 💻 🤔 ✅ | Ricard Marxer 💻 🤔 | Thomas Schaaf 🤔 💻 | David Liu 💻 | ahassoo1 🤔 💻 |
Pawel Cyrta 💻 🤔 | ABCDEFGHIJKL 💻 | Fernando Leon Franco 💻 🤔 | Esaú Villatoro-Tello, Ph. D. 🤔 📖 |
This work was supported by the European Union Horizon 2020 project ELOQUENCE and received a significant development boost during the Johns Hopkins University JSALT 2025 workshop, as part of the "Play your Part" research group. We thank all contributors and the open-source community for their valuable feedback and contributions.
MIT License
Copyright (c) 2025 Idiap Research Institute
FAQs
Synthetic Dialogue Generation and Analysis
We found that sdialog demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.

Research
/Security News
A five-month operation turned 27 npm packages into durable hosting for browser-run lures that mimic document-sharing portals and Microsoft sign-in, targeting 25 organizations across manufacturing, industrial automation, plastics, and healthcare for credential theft.