Configuring Autograders

Pawtograder’s autograder is a GitHub Action, pawtograder/assignment-action, that runs inside each student repository on every push. It overlays the student’s submission onto your grader repo, runs the linter, build, and instructor test suite (and optionally the student’s own tests with mutation/coverage analysis), then reports per-test results, scores, and artifacts back to Pawtograder. This page is the reference for the pawtograder.yml config that drives that flow.

The pawtograder.yml schema is published at https://raw.githubusercontent.com/pawtograder/assignment-action/refs/tags/v3/pawtograder.schema.json. Reference it from the top of your YAML to get IDE autocomplete:

# yaml-language-server: $schema=https://raw.githubusercontent.com/pawtograder/assignment-action/refs/tags/v3/pawtograder.schema.json

grade.yml Workflow

The workflow file students run, plus action inputs and outputs.

pawtograder.yml Configuration

Top-level reference for build, gradedParts, submissionFiles, and friends.

Dependencies

Gate parts or units on prior results.

Feedbot

LLM-generated hints attached to failing tests, including per-test hints from custom graders.

Examples

Working pawtograder.yml files for Java, Python, and mutation testing.

Empty Submission Detection

How Pawtograder flags submissions that haven’t been changed from the starter.

Submission Viewer

How files and grader artifacts render in the UI.

Rerunning the Autograder

Regrade existing submissions against a chosen grader version.

Test Insights & Bulk Regrading

Find systemic test failures and regrade affected submissions in bulk.

Running the Grader Locally

Iterate on your grader outside GitHub Actions.

Architecture Overview

Advanced: the three repos involved and the action’s step-by-step flow.

Running a Forked Action

Advanced: point students at your fork of the action.

The `grade.yml` Workflow

The handout repository ships with a .github/workflows/grade.yml that is cloned into each student repository. You must edit this file to install any language toolchains or dependencies your build needs before the action runs. The action itself only does grading — it does not install Java, Python, Node, etc. A minimal Java workflow looks like this:

name: Submit Assignment and Run Grader
permissions:
  id-token: write
  contents: read
on:
  workflow_dispatch:
  push:
    branches:
      - main

jobs:
  grade:
    name: Submit and Grade Assignment
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          path: submission
      - name: Install Java
        uses: actions/setup-java@v4
        with:
          distribution: 'temurin'
          java-version: '21'
      - name: Collect Submission and Run Grader
        uses: pawtograder/assignment-action@v3
        with:
          grading_server: 'https://api.pawtograder.com'
          action_ref: '${{ github.action_ref }}'
          action_repository: '${{ github.action_repository }}'

The id-token: write permission is required so the action can request an OIDC token. Without it, the action will fail with an “Unable to get OIDC token” error.

The student’s code is checked out into submission/. The action downloads the grader into a sibling grader/ directory. Both you and the student can view the run output (including the action’s job summary table) under the Actions tab of the student repo.

Action Inputs

Input	Required	Description
`grading_server`	yes	URL of the Pawtograder API, typically `https://api.pawtograder.com`.
`action_ref`	yes	Pass `${{ github.action_ref }}` — used by the server to record which grader version ran.
`action_repository`	yes	Pass `${{ github.action_repository }}` — used for the same reason.
`regression_test_job`	no	Numeric ID of a regression-test job. When set, the action swaps the roles of “submission” and “grader” so that a known grader version can be run against a snapshot of a student submission. Set by the Pawtograder backend when launching regression tests, not by hand.
`handout_repo`	no	Deprecated. Ignored as of v3, will be removed in v4. Handout detection is now performed server-side.

Action Outputs

Output	Description
`score`	The numeric score reported by the grader.
`status`	A human-readable status message.

The `pawtograder.yml` Configuration

pawtograder.yml lives at the root of the grader/solution repo. There is currently exactly one grader type (grader: overlay) and it has three required top-level sections:

build — how to build, lint, and test the project.
gradedParts — what tests are worth what points, organized into parts.
submissionFiles — which files from the student repo are collected and overlaid onto the grader.

Optional top-level fields:

feedbot, llm, mutantAdvice — LLM-based features (see Feedbot and Mutation Test Units).
maxImplementationHints: N — across all regular units, show full output for at most N failing tests. Once the limit is reached, additional failing tests still count against the score but are summarized as “N additional failing tests not shown.” This is a running total across the entire submission, so put the most important parts first in gradedParts if you care which hints “win.” For per-unit suppression, use hide_output: true on a regular unit (see below).
maxMutantHints: N — caps mutantAdvice hints shown to students; covered with the mutation example in Mutation Test Units.
fallbackFiles — provides defaults for files the student didn’t submit (see below).

`build`

The only required field is preset. The other fields are conditional: script_info and venv apply to the python-script preset, student_tests controls mutation/coverage features, and timeouts_seconds overrides the built-in timeouts.

Presets

java-gradle — Builds with ./gradlew test, uses Surefire XML for test results, JaCoCo for coverage, Checkstyle for linting, and Pitest for mutation testing. The grader repo must contain a working build.gradle.
python-script — Runs the shell commands you provide in script_info (see below). Use this when you want full control over how tests, coverage, and mutation are produced.
none — Disables building, linting, and testing entirely. The action still records the submission and runs handgrading flows. Useful for write-only / artifact-only assignments.

Linter

build:
  linter:
    preset: checkstyle # currently the only option
    policy: fail       # or: ignore

policy: ignore — lint errors are reported in the grading summary but tests still run.
policy: fail — if the linter finds errors, the rest of grading is skipped and the student receives a zero. The submission does not count against any per-assignment submission cap if you have configured one.

`student_tests`

Controls what to do with the student’s own test suite. Tests are run in two contexts:

build:
  student_tests:
    instructor_impl:
      run_tests: true              # run student tests against the instructor's solution
      run_mutation: true           # run student tests against mutants of the instructor solution
      report_mutation_coverage: true
    student_impl:
      run_tests: true              # run student tests against the student's own implementation
      report_branch_coverage: true # emit a coverage report artifact
      run_mutation: true           # run mutation against the student's implementation
      report_mutation_coverage: true

Mutation analysis under instructor_impl only runs if the student’s tests first pass against the instructor’s reference solution. The rationale is that if a student’s tests fail against a known-correct implementation, they’re asserting wrong behavior, so their mutation score isn’t meaningful. The action surfaces those failing tests in a dedicated “your test suite contains incorrect tests” message.

`timeouts_seconds`

All sub-fields are optional; the defaults are:

Phase	Default (seconds)
`build`	600
`instructor_tests`	300
`student_tests`	300
`mutants`	1800

build:
  timeouts_seconds:
    build: 900
    mutants: 3600

`venv` and `script_info` (Python preset)

For the python-script preset, you supply the shell commands the builder should run for each phase:

build:
  preset: python-script
  venv:
    dir_name: '.venv'
    cache_key: 'sp26-cs2100-lab0' # used to cache the venv across runs of the same assignment
  script_info:
    install_deps: 'pip install -r requirements.txt'
    setup_venv: 'python3 -m venv .venv'
    activate_venv: '. .venv/bin/activate'
    linting_report: './generate_linting_reports.sh'
    html_coverage_reports: './generate_coverage_reports.sh'
    textual_coverage_reports: './generate_textual_coverage_reports.sh'
    test_runner: 'python3 test_runner.py'
    mutation_test_runner: 'python3 mutation_test_runner.py'

All script_info fields are required even if a given phase isn’t used — provide a no-op command if you don’t need one. cache_key keys the cached venv across runs; bump it when requirements.txt changes.

`artifacts`

A list of files or directories the grader will produce and upload to the submission view. Each entry has a name (shown in the UI), a path (relative to the grading workspace, or absolute), and optional data (a free-form object — for example, { "format": "zip", "display": "html_site" } tells the UI to render a directory as a navigable HTML site).

build:
  artifacts:
    - name: 'Coverage HTML'
      path: 'build/reports/jacoco/test/html'
      data:
        format: zip
        display: html_site

Missing artifacts are logged but not fatal. The mutation/coverage reports that the action generates automatically (when report_mutation_coverage or report_branch_coverage is enabled) are added to this list at runtime — you don’t need to declare them yourself.

`gradedParts`

gradedParts:
  - name: 'Part 1: Basics'
    hide_until_released: false # default
    gradedUnits:
      - ...

Each part has a name and an array of gradedUnits. Optional fields:

hide_until_released: true — students cannot see this part’s score or test output until the submission is released for grading.
dependencies — see Dependencies below.
hideFeedbot: true — Feedbot will not generate hints for any failing test in this part.

There are two kinds of unit, and they can be mixed freely within a part.

Regular Test Units

- name: 'Valid Construction'
  tests:
    - CreditCardPublicTest.testValidConstruction
  points: 1
  testCount: 1
  allow_partial_credit: true
  hide_output: false

tests may be a single string or an array of strings. Each string is matched as a prefix against the fully qualified test names emitted by the test runner (for JUnit, package.ClassName.testMethod). A prefix like CreditCardPublicTest. matches every method on that class.
testCount is the number of tests you expect to match. Setting this explicitly is intentional: it prevents a typo in a prefix from silently awarding full marks for zero tests.
points is the unit’s max score.
allow_partial_credit — defaults to false. When false, the student earns points only if all matched tests pass and the number of passing tests equals testCount. When true, the student earns points * (passing / testCount).
hide_output: true — replaces student-visible test output with “Output for this test is intentionally hidden.” The full output is still recorded as hidden_output and is visible to staff.
hideFeedbot: true — suppresses Feedbot hints for this unit only.

The old documentation claimed partial credit was enabled by default. It is not — allow_partial_credit defaults to false, meaning by default a unit is all-or-nothing.

Mutation Test Units

- name: 'Detect bugs in BoxSet'
  locations:
    - 'box.SimpleBoxSet'        # whole class
    - 'box.SimpleBoxSet:10-50'  # line range within a class
    - 'MathMutator'              # name of a Pitest mutator
  breakPoints:
    - minimumMutantsDetected: 5
      pointsToAward: 5
    - minimumMutantsDetected: 10
      pointsToAward: 10

locations is an array of strings. Each entry can be a class name (the unit counts mutants whose location starts with that class), a class with a line range (ClassName:startLine:endLine or, accepted equivalently, ClassName-startLine-endLine), or the name of a Pitest mutator (matched against the mutator field of each mutant).
Scoring uses either breakPoints or linearScoring, not both:
- breakPoints — array of { minimumMutantsDetected, pointsToAward }. The unit picks the first (highest-numbered) breakpoint whose threshold the student met. Order them descending; the unit’s max score is taken from the first entry.
- linearScoring: { total_faults, points } — awards (detected / total_faults) * points.
hideFeedbot: true — same meaning as on regular units.

If mutantAdvice is configured at the top level, mutants the student didn’t detect can show a personalized hint:

mutantAdvice:
  - name: 'Off-by-one'
    sourceClass: 'box.SimpleBoxSet'
    targetClass: 'box.SimpleBoxSet_ROR_1'
    prompt: 'What happens at the boundary when the set is exactly at capacity?'
maxMutantHints: 5

maxMutantHints caps the total number of mutantAdvice hints shown across all mutation units in a single submission. Like maxImplementationHints, it’s a running total — order gradedParts so the most important parts come first if you care which hints “win.” Omit it to show all available hints.

`submissionFiles`

submissionFiles:
  files:
    - 'src/main/java/**/*.java'
  testFiles:
    - 'src/test/java/**/*.java'

files are the source/implementation files that get overlaid onto the grader for the instructor test runs.
testFiles are student-written tests; they are kept separate so they can be overlaid only when the action wants to grade the student’s own tests, and so that mutation/coverage analysis has a clean target.
Patterns are GitHub Actions globs — ** for “any subdirectories”, * for “any name in this directory”. You can list a literal file alongside a glob to make that file required.

If none of the student’s files match any pattern in submissionFiles, the submission is rejected immediately with an error that lists the unmatched patterns and identifies the grader repository and commit SHA where the config lives. The error is surfaced on the submission page so both the student and instructor can see it. This is the most common cause of a “submission has no files” error — usually a glob in the grader repo that doesn’t match the project layout in the handout repo.

`fallbackFiles`

Optional. The path (relative to the grader repo) of a directory whose contents should be copied into the grading workspace for any file the student did not submit. Useful when students may delete files that your test harness expects to exist.

fallbackFiles: 'fallback'

Files already present from the student’s submission are never overwritten by fallbacks.

Dependencies

Both gradedParts and gradedUnits accept a dependencies array. If any dependency is not met, that part (or unit) is replaced in the feedback with a message explaining which dependency failed instead of the actual grading output. A dependency may be written in any of three forms:

dependencies:
  # 1. String shorthand: requires full marks on the named part
  - 'Part 1: Basics'

  # 2. Part reference with a raw-score threshold
  - part: 'Part 1: Basics'
    minScore: 15

  # 3. Unit reference with a raw-score threshold
  - unit: 'Unit 1.1: Setup'
    minScore: 8

If minScore is omitted, the dependency requires the maximum score for the referenced part or unit.
minScore is a raw score, not a percentage.
When a part’s dependencies fail, the entire part is replaced with one feedback entry.
When a unit’s dependencies fail (but the part’s are satisfied), only that unit is replaced.

gradedParts:
  - name: 'Part 1: Basics'
    gradedUnits:
      - name: 'Unit 1.1: Setup'
        tests: '[T1.1'
        points: 10
        testCount: 5
      - name: 'Unit 1.2: Core'
        dependencies:
          - unit: 'Unit 1.1: Setup' # require 100% on 1.1
        tests: '[T1.2'
        points: 15
        testCount: 8

  - name: 'Part 2: Advanced'
    dependencies:
      - 'Part 1: Basics'
    gradedUnits:
      - name: 'Unit 2.1: Advanced Ops'
        dependencies:
          - part: 'Part 1: Basics'
            minScore: 20
          - unit: 'Unit 1.2: Core'
        tests: '[T2.1'
        points: 20
        testCount: 10

Feedbot

Feedbot is optional, LLM-generated feedback that the grading server can attach to failing tests. When enabled in pawtograder.yml, the action includes an llm block on each failing test result so the grading server knows which model and account to use.

feedbot:
  enabled: true
  spec_url: 'https://example.com/path/to/assignment-spec.md'
  provider: openrouter
  model: openai/gpt-4o-mini
  account: default
  prompt: chain_of_thought    # or: checklist, or any free-form string
  rate_limit:
    cooldown: 5               # seconds between requests for the same student
    assignment_total: 100     # cap per assignment (default: server-side)
    class_total: 5000         # cap per class (default: server-side)

enabled is required for Feedbot to run at all.
provider, model, account, and spec_url are all required when enabled: true. If any are missing, Feedbot is disabled for the run and a warning is written to the visible output.
spec_url should point to a markdown file with the assignment spec. The action fetches it at grading time with a 10-second timeout; if the fetch fails, Feedbot is disabled for that run and the failure is logged.
prompt selects the response strategy. The two built-ins are chain_of_thought (default) and checklist. Any other string is used as a free-form custom strategy instruction; the embedded assignment spec and the underlying role/rules are not changed.
account selects which set of provider credentials the server uses — for example, account: cs2100 will look up OPENROUTER_API_KEY_cs2100 (falling back to OPENROUTER_API_KEY) when Feedbot dispatches the call.

Supported providers and the environment variables the server uses to look up credentials (where {account} is the value of the account field):

openai — OPENAI_API_KEY or OPENAI_API_KEY_{account}.
azure — AZURE_OPENAI_ENDPOINT plus AZURE_OPENAI_KEY (or AZURE_OPENAI_KEY_{account}).
anthropic — ANTHROPIC_API_KEY or ANTHROPIC_API_KEY_{account}.
openrouter — OPENROUTER_API_KEY or OPENROUTER_API_KEY_{account}. Use models like openai/gpt-4o-mini, anthropic/claude-3-haiku, google/gemini-pro.

You can suppress Feedbot per-part or per-unit by setting hideFeedbot: true on the part or unit.

Per-Test Hints from Custom Graders

When Feedbot is enabled, the action automatically emits an extra_data.llm block on each failing test so the server knows which model/account to invoke. If you are writing a custom python-script grader and want to provide per-test hint configuration directly (rather than going through the feedbot block), you can author the llm block in your test output yourself:

{
  "llm": {
    "type": "v1",
    "prompt": "You are a helpful CS tutor. Guide the student to fix their code without giving away the solution.",
    "provider": "openrouter",
    "model": "openai/gpt-4o-mini",
    "account": "default",
    "temperature": 0.85,
    "max_tokens": 500
  }
}

The provider, model, and account fields use the same key lookups documented above.

Examples

Java with Gradle and JUnit

# yaml-language-server: $schema=https://raw.githubusercontent.com/pawtograder/assignment-action/refs/tags/v3/pawtograder.schema.json
grader: 'overlay'
build:
  preset: 'java-gradle'
  cmd: './gradlew test'
  linter:
    preset: 'checkstyle'
    policy: 'ignore'
  student_tests:
    student_impl:
      report_branch_coverage: true
gradedParts:
  - name: Public Tests
    gradedUnits:
      - name: Valid Construction
        points: 1
        testCount: 1
        allow_partial_credit: true
        tests:
          - CreditCardPublicTest.testValidConstruction
      - name: Invalid Construction
        points: 3
        testCount: 3
        allow_partial_credit: true
        tests:
          - CreditCardPublicTest.testInvalidCreditLimitOnConstruction
          - CreditCardPublicTest.testInvalidAprOnConstruction
          - CreditCardPublicTest.testInvalidLateFeeOnConstruction
submissionFiles:
  files:
    - 'src/main/java/**/*.java'
  testFiles:
    - 'src/test/java/**/*.java'

Python with Custom Scripts

# yaml-language-server: $schema=https://raw.githubusercontent.com/pawtograder/assignment-action/refs/tags/v3/pawtograder.schema.json
grader: 'overlay'
build:
  preset: 'python-script'
  linter:
    preset: 'checkstyle'
    policy: 'ignore'
  student_tests:
    student_impl:
      run_tests: true
      report_branch_coverage: false
    instructor_impl:
      run_tests: true
      run_mutation: false
      report_mutation_coverage: false
  venv:
    dir_name: '.venv'
    cache_key: 'sp26-cs2100-lab0'
  script_info:
    install_deps: 'pip install -r requirements.txt'
    setup_venv: 'python3 -m venv .venv'
    activate_venv: '. .venv/bin/activate'
    linting_report: './generate_linting_reports.sh'
    html_coverage_reports: './generate_coverage_reports.sh'
    textual_coverage_reports: './generate_textual_coverage_reports.sh'
    test_runner: 'python3 test_runner.py'
    mutation_test_runner: 'python3 mutation_test_runner.py'
gradedParts:
  - name: Instructor tests on Student Implementation
    gradedUnits:
      - name: Is a palindrome
        tests:
          - test_q1.TestIsPalindromeTrue
        points: 20
        testCount: 1
      - name: Is not a palindrome
        tests:
          - test_q1.TestIsPalindromeFalse
        points: 20
        testCount: 1
      - name: Style (Pylint and Mypy)
        tests:
          - test_style.TestStyleReports
        points: 40
        testCount: 1
submissionFiles:
  files:
    - 'src/*.py'
  testFiles:
    - 'tests/test_*.py'

Java with Mutation Testing (Pitest)

This example also grades the student’s own tests for fault-detection strength. The Gradle plugin used is info.solidsoft.pitest.

# yaml-language-server: $schema=https://raw.githubusercontent.com/pawtograder/assignment-action/refs/tags/v3/pawtograder.schema.json
grader: 'overlay'
build:
  preset: 'java-gradle'
  cmd: './gradlew test'
  linter:
    preset: 'checkstyle'
    policy: 'fail'
  student_tests:
    instructor_impl:
      run_tests: true
      run_mutation: true
      report_mutation_coverage: true
    student_impl:
      run_tests: true
      run_mutation: true
      report_mutation_coverage: true
      report_branch_coverage: true

gradedParts:
  - name: Student-Visible Test Results
    gradedUnits:
      - name: Simple BoxSet Visible
        points: 35
        testCount: 7
        allow_partial_credit: true
        tests:
          - SimpleBoxSetVisibleTest.
  - name: Hidden Test Results
    hide_until_released: true
    gradedUnits:
      - name: Simple BoxSet Hidden
        points: 25
        testCount: 5
        allow_partial_credit: true
        tests:
          - SimpleBoxSetHiddenTest.
  - name: Fault Detection
    gradedUnits:
      - name: Detect bugs in BoxSet
        locations:
          - 'box.SimpleBoxSet'
        breakPoints:
          - minimumMutantsDetected: 5
            pointsToAward: 5
          - minimumMutantsDetected: 10
            pointsToAward: 10
submissionFiles:
  files:
    - 'src/main/java/box/BoxSet.java'
    - 'src/main/java/box/SimpleBoxSet.java'
    - 'src/main/java/**/*.java'
  testFiles:
    - 'src/test/java/SimpleBoxSetTest.java'
    - 'src/test/java/**/*.java'

A matching build.gradle enables the Pitest plugin:

plugins {
    id 'java'
    id 'application'
    id 'checkstyle'
    id 'jacoco'
    id 'info.solidsoft.pitest' version '1.15.0'
}

pitest {
    targetClasses = ['box.*']
    targetTests = ['*']
    pitestVersion = '1.15.8'
    threads = 4
    outputFormats = ['XML', 'HTML']
    timestampedReports = false
    testPlugin = 'junit'
    exportLineCoverage = true
    failWhenNoMutations = false
}

jacocoTestReport {
    dependsOn test
    reports {
        html.required = true
        xml.required = false
        csv.required = true
    }
}

test {
    useJUnit()
    finalizedBy tasks.jacocoTestReport
    ignoreFailures = true
}

checkstyle {
    toolVersion = '10.23.1'
    configFile = file("${rootDir}/config/checkstyle/checkstyle.xml")
    maxWarnings = 0
    ignoreFailures = false
}

tasks.withType(Checkstyle) {
    reports {
        xml.required = true
        html.required = true
    }
}

Empty Submission Detection

Pawtograder automatically flags submissions whose collected files are identical to (or essentially unchanged from) the starter code. These show up in the grading interface so you can quickly find students who pushed without making any actual changes — for example, students who set up the repository but never started the assignment. Empty submission detection looks only at the files that match submissionFiles, so it respects whatever scope you defined for the assignment.

Submission Viewer

Submission files and grader-generated artifacts are displayed side by side in the submission viewer. Submitted files:

Text files render with syntax highlighting.
Markdown files (.md, .markdown) render as formatted HTML with code-block highlighting, images, tables, and links.
Binary files (images, PDFs, executables) are stored with the submission and exposed as a download button alongside file metadata.

Grader artifacts appear next to the submitted files. The action recognizes a few rendering hints via the data object on each artifact:

Plain-text artifacts (.txt, .log) render with line numbers and syntax highlighting.
Markdown artifacts render as formatted HTML.
Directory artifacts with data: { format: zip, display: html_site } are uploaded as a zip and rendered as a navigable HTML site (this is how Jacoco/Pitest HTML reports show up).
Other binary artifacts are exposed as downloads.

Graders can attach rubric checks directly to an artifact by setting annotation_target: artifact on the rubric check and naming the artifact in the artifact field. See the Rubrics documentation for details.

Rerunning the Autograder

You can rerun the autograder on an existing submission from the assignment page, the test-insights page, or an individual submission. Reruns keep the original submission record (same timestamp, same submission count) and replace the autograder result. Each rerun lets you choose which grader version to use:

The current grader (latest commit on the grader repo’s default branch).
A specific commit from the recent history list.
A manual SHA, for precise version control.

Optionally, enable Auto-promote to make the chosen grader version the new default for all future submissions if the rerun completes successfully. This is useful after fixing a bug in the grader or amending tests — you don’t have to push a new commit, you just rerun against the old SHA and promote.

Rerunning replaces the existing autograder result for the selected submissions. If you have already released grades, rerun against a single test submission first to confirm the new behavior.

Test Insights and Bulk Regrading

The Test Insights view groups identical test failures across the whole class so you can quickly find systemic problems (a flaky test, an ambiguous spec, an off-by-one in your reference solution). From any error group you can:

See the number of affected submissions and their average score.
View and copy the email addresses of affected students.
Pin globally important issues so they remain visible across assignments.
Launch a regrade with those submissions preselected on the rerun-autograder dialog.

The regrade flow accepts the same grader-version options as the single-submission rerun, including Auto-promote.

Running the Grader Locally

You can run the grader against a local solution and a local submission without involving GitHub Actions or the Pawtograder server. From a clone of pawtograder/assignment-action:

npx tsimp src/grading/main.ts \
  -s /full/path/to/solution/repo \
  -u /full/path/to/submission/repo

Use absolute paths. The grader produces output in a pawtograder-grading/ directory in your current working directory; delete it between runs (or you may hit EACCES errors copying files).

Architecture Overview

When the action runs in the student repo, it:

Authenticates with the grading server

GitHub issues an OIDC token to the workflow. The action sends that token to the autograder-create-submission edge function. The grading server verifies the token (so it knows which repo and commit the request came from), runs security checks, registers a new submission, and returns a one-time download URL for the matching grader repository tarball.

Downloads the grader and reads pawtograder.yml

The action extracts the grader tarball alongside the student’s checkout and reads pawtograder.yml from the grader repo. The config selects an “overlay” grader and a build preset (java-gradle, python-script, or none).

Overlays student files onto the grader

For each glob in submissionFiles.files and submissionFiles.testFiles, the action deletes the matching files in the grader checkout and copies the student’s files in. This is the “overlay”: the grader repo provides the harness, the student’s files are layered on top.

Lints, builds, and runs instructor tests

The selected builder runs the linter (if configured), then a clean build, then the instructor test suite. Results are parsed into per-test pass/fail records. If linter.policy: fail is set and the linter fails, or if the build fails, grading stops and a zero is recorded.

Optionally runs student tests and mutation analysis

If student_tests is configured, the action resets the grader’s solution files, layers in only the student test files, and runs them against the instructor implementation (and optionally mutation testing). It can also run the student’s tests against the student’s own implementation to report branch and mutation coverage.

Scores parts and units, resolves dependencies

Scores are computed for every gradedUnit, summed into gradedPart scores, and then dependency rules are applied — units or parts whose dependencies aren’t satisfied are replaced with a message instead of their actual results.

Submits feedback and uploads artifacts

The action calls autograder-submit-feedback with the tests, lint output, and logs. If the grader emitted any artifacts, they are uploaded to Supabase storage via the signed URLs returned by the server. A summary table is also written to the GitHub Actions job summary.

If the action detects that the push came from a handout/template repo rather than a student repo, the server returns a handout_notice and the action exits successfully without grading.

Running a Forked Action

The grading action is fully open source at pawtograder/assignment-action, so if the pawtograder.yml schema documented above isn’t expressive enough for what your assignment needs, you can fork the action and point your assignment’s grading workflow at your fork instead. Common reasons to do this include adding a new build preset, changing how scores are computed, or wiring up custom artifact handling. To use a fork, change the uses: line in grade.yml to point at your fork and the ref (tag, branch, or commit SHA) you want students to run:

- name: Collect Submission and Run Grader
  uses: your-org/assignment-action@your-tag

Forks work because the grading server identifies submissions by the student’s OIDC token, not by which copy of the action is running. The action_ref and action_repository inputs are still passed through so the server can record exactly which build of the action graded each submission.

Forking means you own the grader from that point forward. Upstream improvements and bug fixes won’t reach your students until you merge them into your fork and bump the ref in grade.yml. Only fork when the configurable surface of pawtograder.yml (documented above) genuinely isn’t enough — most customization needs can be expressed via build, gradedParts, and the python-script preset without touching the action itself.

​Configuring Autograders

​On This Page

grade.yml Workflow

pawtograder.yml Configuration

Dependencies

Feedbot

Examples

Empty Submission Detection

Submission Viewer

Rerunning the Autograder

Test Insights & Bulk Regrading

Running the Grader Locally

Architecture Overview

Running a Forked Action

​The grade.yml Workflow

​Action Inputs

​Action Outputs

​The pawtograder.yml Configuration

​build

​Presets

​Linter

​student_tests

​timeouts_seconds

​venv and script_info (Python preset)

​artifacts

​gradedParts

​Regular Test Units

​Mutation Test Units

​submissionFiles

​fallbackFiles

​Dependencies

​Feedbot

​Per-Test Hints from Custom Graders

​Examples

​Java with Gradle and JUnit

​Python with Custom Scripts

​Java with Mutation Testing (Pitest)

​Empty Submission Detection

​Submission Viewer

​Rerunning the Autograder

​Test Insights and Bulk Regrading

​Running the Grader Locally

​Architecture Overview

​Running a Forked Action

Configuring Autograders

On This Page

The `grade.yml` Workflow

Action Inputs

Action Outputs

The `pawtograder.yml` Configuration

`build`

Presets

Linter

`student_tests`

`timeouts_seconds`

`venv` and `script_info` (Python preset)

`artifacts`

`gradedParts`

Regular Test Units

Mutation Test Units

`submissionFiles`

`fallbackFiles`

Dependencies

Feedbot

Per-Test Hints from Custom Graders

Examples

Java with Gradle and JUnit

Python with Custom Scripts

Java with Mutation Testing (Pitest)

Empty Submission Detection

Submission Viewer

Rerunning the Autograder

Test Insights and Bulk Regrading

Running the Grader Locally

Architecture Overview

Running a Forked Action