Skip to content

Inference

Two scripts for generating model responses and projecting activations onto trait vectors.

For pipeline modes and projection scores, see Inference Guide.


inference/run_inference_pipeline.py

Full inference pipeline -- generate responses and project onto trait vectors (stream-through), capture raw activations, or project from saved activations.

python inference/run_inference_pipeline.py \
    --experiment <experiment> --prompt-set <prompt_set>

Required

Flag Type Default Description
--experiment str required Experiment name
--prompt-set str required Prompt set path, e.g. starter_prompts/general

Pipeline Mode

Mutually exclusive flags that control what the pipeline does. Default is stream-through (generate + project in one pass).

Flag Type Default Description
(default) -- -- Stream-through: generate + project in one pass
--capture flag off Save raw .pt activations instead of projecting
--from-activations flag off Project from saved .pt files (run after --capture)
--regenerate flag off Force re-generate responses even if they exist

Projection

Flag Type Default Description
--traits str all Comma-separated traits to project (default: all extracted)
--layers str best,best+5 Layers to project onto. Supports: specific (25,30), ranges (20-40), best keyword (from steering), best+N
--component str residual Activation component: residual, attn_contribution, mlp_contribution
--centered flag off Center activations before projection
--force flag off Re-run even if projections exist (default: skip existing)
--score-mode str normalized Score normalization: raw (no divisor), normalized (divide by mean ||h|| over response), cosine (divide by per-token ||h||)

Generation

Flag Type Default Description
--max-new-tokens int 512 Max tokens to generate
--temperature float 0.0 Sampling temperature
--model-variant str config default Model variant (default: config.defaults.application)

Model

Flag Type Default Description
--load-in-4bit flag off 4-bit quantization
--backend str local Model backend: local (HF in-process, default), auto (try server then fall back to local), vllm (generation only, no hooks)

Examples

# Stream-through: generate responses + project onto all traits
python inference/run_inference_pipeline.py \
    --experiment my_exp --prompt-set starter_prompts/general

# Capture raw activations for later re-projection
python inference/run_inference_pipeline.py \
    --experiment my_exp --prompt-set starter_prompts/general --capture

# Project from saved activations at specific layers
python inference/run_inference_pipeline.py \
    --experiment my_exp --prompt-set starter_prompts/general \
    --from-activations --layers 25,30,35

# Project specific traits using best steering layer
python inference/run_inference_pipeline.py \
    --experiment my_exp --prompt-set starter_prompts/general \
    --traits starter_traits/sycophancy --layers best

# Cosine similarity scores instead of normalized projection
python inference/run_inference_pipeline.py \
    --experiment my_exp --prompt-set starter_prompts/general \
    --score-mode cosine

inference/generate_responses.py

Generate model responses for inference prompts (standalone). Called automatically by the pipeline but can run independently.

Two modes: Mode A generates from the model; Mode B writes external response text (tokenizer only, no GPU).

python inference/generate_responses.py \
    --experiment <experiment> --prompt-set <prompt_set>

Flags

Flag Type Default Description
--experiment str required Experiment name
--prompt-set str required Prompt set from datasets/inference/{name}.json
--model-variant str config default Model variant (default: config.defaults.application)

Generation (Mode A)

Flag Type Default Description
--max-new-tokens int 512 Max tokens to generate
--temperature float 0.0 Sampling temperature
--prefill str none String prepended to model response before generation

External Responses (Mode B)

Flag Type Default Description
--from-responses str none Path to {id: response_text} JSON. Tokenizer only, no GPU

Common

Flag Type Default Description
--force flag off Re-run even if responses exist (default: skip existing)
--limit int none Max prompts to process
--output-suffix str none Suffix appended to output directory name
--load-in-4bit flag off 4-bit quantization
--backend str local Model backend: local (default), auto, vllm

Examples

# Generate responses for a prompt set
python inference/generate_responses.py \
    --experiment my_exp --prompt-set starter_prompts/general

# Generate with prefill text
python inference/generate_responses.py \
    --experiment my_exp --prompt-set starter_prompts/general \
    --prefill "I think"

# Import external responses (no GPU needed)
python inference/generate_responses.py \
    --experiment my_exp --prompt-set starter_prompts/general \
    --from-responses external_responses.json

# Generate first 10 prompts with a specific variant
python inference/generate_responses.py \
    --experiment my_exp --prompt-set starter_prompts/general \
    --model-variant rm_lora --limit 10

Tip

Use --from-responses to import responses from external models (e.g., API-based models) without loading any weights. Only a tokenizer is needed.