Experiment Metadata Leakage
How experiment metadata reveals sensitive information: hyperparameters exposing architecture secrets, loss curves revealing training data properties, run names and tags disclosing project intent, and techniques for extracting intelligence from ML experiment logs.
Experiment Metadata Leakage
Every experiment logged to a tracking platform generates metadata that, individually, appears innocuous. Learning rates, loss values, and run names do not look like sensitive data. But metadata accumulates, and the aggregate reveals far more than any single data point. An attacker who can read experiment metadata -- through a leaked API key, an unprotected tracking instance, or a public project -- can reconstruct an organization's ML strategy, training data characteristics, and architectural decisions without ever accessing a single model weight file.
Hyperparameters Revealing Architecture
Hyperparameters logged for reproducibility encode the complete blueprint of a model's architecture and training methodology. An attacker reading these parameters gains the equivalent of access to the model definition code.
What Hyperparameters Reveal
| Parameter | What It Reveals |
|---|---|
num_layers, hidden_size, num_attention_heads | Exact transformer architecture dimensions |
vocab_size, max_position_embeddings | Tokenizer and context window specifications |
num_experts, num_experts_per_tok | Mixture-of-Experts architecture and routing configuration |
rope_theta, rope_scaling | Positional encoding strategy and context extension approach |
intermediate_size, num_key_value_heads | MLP dimensions and grouped-query attention configuration |
lora_r, lora_alpha, lora_target_modules | Fine-tuning methodology and which layers are adapted |
quantization_config | Precision trade-offs and deployment constraints |
Reconstructing Architecture from Hyperparameters
A systematic approach to architecture reconstruction:
Collect hyperparameters across runs
Gather all logged hyperparameters from multiple training runs. Architecture parameters are usually consistent across runs while training parameters vary.
Identify the base architecture
Match the hyperparameter set against known model architectures (Llama, Mistral, GPT-NeoX, etc.). The combination of
hidden_size,num_attention_heads, andnum_layersis often unique to a specific base model.Determine modifications
Look for parameters that deviate from the base architecture: custom layer counts indicate knowledge distillation, unusual attention head configurations suggest architectural innovations, and the presence of LoRA parameters reveals fine-tuning targets.
Map the training methodology
Optimizer choice, learning rate schedule, warmup steps, and gradient accumulation reveal the training recipe. Combined with architecture knowledge, this enables reproduction of the model.
Case Study: Architecture Inference
Consider these hyperparameters extracted from a tracking platform:
hidden_size: 4096
num_attention_heads: 32
num_key_value_heads: 8
num_layers: 32
intermediate_size: 14336
vocab_size: 32000
max_position_embeddings: 32768
rope_theta: 500000
This immediately identifies a Llama 2-class architecture with grouped-query attention (32 heads, 8 KV heads), an extended context window (32K via high RoPE theta), and a standard MLP ratio. The rope_theta: 500000 reveals the team is using a specific context extension technique rather than the default 10000.
Loss Curves Revealing Training Data Properties
Training loss curves are more than performance indicators. Their shape, magnitude, and behavior encode information about the training data that produced them.
What Loss Curves Reveal
Initial loss magnitude. The starting loss value before any training indicates the model's prior knowledge of the training distribution. A low initial loss suggests fine-tuning from a model already familiar with similar data. A high initial loss indicates a significant distribution shift.
Convergence rate. How quickly loss decreases reveals dataset size and diversity. Rapid convergence suggests a small or homogeneous dataset. Slow, steady convergence indicates a large, diverse corpus.
Loss plateaus and drops. Sudden drops in loss often correspond to learning rate schedule changes, but unexpected plateaus can reveal data quality issues or distribution boundaries in the training data.
Validation-training gap. The gap between training and validation loss reveals overfitting, which in turn reveals effective dataset size. A large gap on a small model suggests limited training data -- valuable intelligence for data poisoning attacks.
Metric-Based Data Inference
| Metric Pattern | Inference |
|---|---|
| Low perplexity on domain-specific eval | Training data includes domain-specific content |
| Sudden accuracy improvement on specific benchmark | Dataset augmented with benchmark-relevant data |
| High variance in per-batch loss | Training data is heterogeneous or poorly shuffled |
| Evaluation metrics plateau while training loss decreases | Model memorizing training data -- small effective dataset |
| Different convergence rates across training runs | Dataset composition changed between runs |
Detecting Dataset Changes
Comparing loss curves across experiment runs reveals when training data changed:
- A run with identical hyperparameters but different loss behavior indicates a dataset modification
- Improved metrics on specific evaluation sets after a data change reveal what data was added
- Degraded metrics on previously strong benchmarks reveal what data was removed
Run Names and Tags Revealing Project Intent
Naming Convention Intelligence
ML teams use descriptive names for experiments, runs, and tags. These names reveal organizational priorities, research directions, and development timelines.
Common naming patterns and what they reveal:
| Pattern | Example | Intelligence |
|---|---|---|
| Feature names | medical-qa-v3, code-gen-rust-support | Active development areas |
| Customer names | acme-corp-custom, client-x-finetune | Business relationships |
| Deadline references | launch-q2-candidate, demo-board-meeting | Timeline and priority |
| Competition | beat-gpt4-math, match-claude-coding | Competitive positioning |
| Security research | jailbreak-resistant-v2, safety-filter-bypass-test | Security posture and concerns |
Tag-Based Intelligence
Tags provide categorical metadata that reveals organizational structure:
- Team tags (
team:nlp,team:safety) reveal organizational structure - Priority tags (
priority:p0,critical) reveal what the organization considers important - Status tags (
status:blocked,needs-review) reveal development bottlenecks - Deployment tags (
env:staging,region:eu-west) reveal infrastructure topology
Experiment Organization
How experiments are organized into projects and grouped reveals the organization's ML strategy:
- Project names indicate product lines and research areas
- The number of runs per project indicates investment level
- Active vs. inactive projects reveal strategic pivots
- Cross-references between projects reveal dependencies
Combining Metadata Signals
The real power of metadata analysis comes from combining signals across multiple dimensions. Individual data points are suggestive; correlations are conclusive.
Cross-Signal Analysis Framework
Timeline reconstruction
Order all experiments chronologically. Map the progression of architecture changes, dataset modifications, and performance improvements. This reveals the development narrative.
Resource allocation mapping
Correlate GPU hours, run durations, and experiment frequency to identify which projects receive the most investment. High-resource projects are the organization's priorities.
Dependency mapping
Trace artifact lineage across experiments. Identify which datasets feed which models, which base models are used for fine-tuning, and how the pipeline connects.
Personnel identification
If runs are tagged with user IDs or author information, map which individuals work on which projects. This reveals the team structure and key personnel.
Intelligence Report Structure
A comprehensive metadata analysis produces:
- Architecture inventory: All model architectures in development, their specifications, and evolution over time
- Data asset map: Training datasets used, their sources (inferred from paths and descriptions), and when they were modified
- Capability roadmap: Which capabilities the organization is developing, based on evaluation metrics and project names
- Security posture: How the organization approaches safety and security, based on safety-related experiments and evaluation benchmarks
- Timeline: Development velocity, upcoming milestones, and historical pivots
Metadata Hygiene
What to Sanitize
Not all metadata needs to be logged to a shared tracking system. Implement filtering to remove:
- Absolute file paths (reveal infrastructure layout)
- Environment variables (frequently contain credentials)
- Customer-identifying information in run names
- Internal project codenames in tags
- Competitive references in descriptions
What to Keep
Reproducibility requires certain metadata. The key is to log what is needed and nothing more:
- Model architecture parameters (necessary for reproducibility, accept the risk)
- Training hyperparameters (learning rate, batch size, etc.)
- Anonymized evaluation metrics
- Artifact hashes (not paths)
- Version identifiers (not descriptive names)
Access Tiering
Implement different access levels for different metadata sensitivity:
| Tier | Content | Access |
|---|---|---|
| Public | Published model metrics, architecture summaries | Anyone |
| Team | Full hyperparameters, loss curves, run configs | Team members |
| Restricted | Dataset paths, customer-specific runs, security research | Project leads |
| Confidential | Credentials, infrastructure details, competitive analysis | Security team |
References
- Carlini et al., "Extracting Training Data from Large Language Models" -- Training data extraction from model behavior
- Tramèr et al., "Stealing Machine Learning Models via Prediction APIs" -- Model extraction through metadata
- MITRE ATLAS: ML Model Inference -- Threat taxonomy for model information leakage
An attacker observes that a model's training loss drops sharply at epoch 15, then converges at a lower level. The same hyperparameters were used throughout training. What is the most likely explanation, and why does it matter?