Functory
functory.
6 min read
Functory

Computing Student Quiz Drop-off Points from LMS CSV Exports in Python for Accreditation Reporting

This article shows how to write a small, single-file Python routine that computes quiz "drop-off points" from Learning Management System (LMS) CSV exports, and how to publish it as a hosted API using Functory. The goal is to give operations managers and analytics teams a repeatable function that turns raw quiz rows into accreditation-ready metrics without managing servers or cron jobs.

We focus on concrete inputs (CSV schema and sample rows), exact processing steps (participation rates, relative/absolute drops, per-quiz and per-course aggregation), and an implementation pattern that is production-ready for agencies managing multiple clients.

What this function expects and produces

Input: a CSV file exported from an LMS (Brightspace/Moodle/Canvas) with rows representing student answers or interactions, with these columns (exact names expected):

  • student_id (string)
  • course_id (string)
  • quiz_id (string)
  • question_id (string)
  • question_order (integer, 1-based)
  • attempt_id (string)
  • attempt_timestamp (ISO 8601 string)
  • answered (0/1 integer) — 1 if a student submitted an answer to the question, 0 if skipped

Processing steps: normalize types, filter completed attempts if requested, pivot counts per question order to compute participation curves, detect the first question where participation drops by more than an absolute threshold (e.g., 15%) or a relative threshold (e.g., 0.5 meaning 50% of previous question).

Output: a CSV (or DataFrame) with one row per (course_id, quiz_id) containing:

  • course_id, quiz_id
  • start_participants (int) — unique attempt_ids that reached question_order 1
  • drop_question_order (int or null) — question_order where drop first exceeded threshold
  • drop_question_id (string or null)
  • drop_absolute_pct (float) — absolute drop as percent (e.g., 0.2 = 20%)
  • drop_relative (float) — participants_at_q / participants_at_prev_q
  • participation_curve (JSON string) — list like [1.0,0.98,0.94,...]

Real-world scenario

Accreditation requires per-module engagement detail showing where learners abandon formative quizzes. Suppose an agency receives a 1.2MB CSV named exports/course-ENG101-quiz-results.csv with 52,300 rows representing 2,500 attempts across 20 quizzes. Each quiz has 8–15 questions. The stakeholder asks: "For each quiz, which question most students stop at or start skipping?"

Concrete example input rows (CSV):

  • student_id: S12345
  • course_id: ENG101
  • quiz_id: Q3
  • question_id: Q3_05
  • question_order: 5
  • attempt_id: A998
  • attempt_timestamp: 2025-03-02T10:12:00Z
  • answered: 0

Concrete example output (row): course_id=ENG101, quiz_id=Q3, start_participants=250, drop_question_order=7, drop_question_id=Q3_07, drop_absolute_pct=0.21, drop_relative=0.63

Example dataset / use-case

Fabricated but realistic dataset description: 500 students, 20 quizzes across 5 courses, average 10 questions per quiz, 2,500 unique attempts, ~25,000 rows. Problem solved: automatically detect where >15% absolute drop or >40% relative drop occurs and produce a CSV prioritized for accreditation officers showing the worst drop points per course.

When to use this approach

  • You need repeatable, auditable metrics for accreditation or compliance reporting.
  • You receive regular CSV exports and want an API-driven transformation without maintaining cron jobs.
  • You need to integrate this metric into dashboards or automated emails.

Mini workflow (end-to-end)

  1. Export LMS CSV: exports/course-ENG101-quiz-results.csv
  2. Call the function (local or via Functory API) with that file and thresholds
  3. Function returns a CSV path or JSON with detected drop points per quiz
  4. Ingest output into Excel/Google Sheets or BI tool, or attach to accreditation report
  5. Optionally schedule periodic runs by letting Functory be triggered by your backend

Processing algorithm

  1. Group rows by (course_id, quiz_id, attempt_id). For each attempt, mark max(question_order) reached.
  2. Aggregate attempt counts per question_order (count unique attempt_ids that reached or answered this question).
  3. Compute participation curve: participants_at_q / participants_at_q1.
  4. Scan curve sequentially; detect first index q where either (prev - cur) >= absolute_threshold OR cur/prev <= relative_threshold.
  5. Emit drop point details and summary stats for the quiz.

Python implementation example

import pandas as pd
from pathlib import Path

# Minimal single-file function that could be used as Functory main(...)
def main(csv_path: str, abs_threshold: float = 0.15, rel_threshold: float = 0.4) -> str:
    df = pd.read_csv(csv_path, dtype={
        'student_id': str, 'course_id': str, 'quiz_id': str, 'question_id': str,
        'question_order': int, 'attempt_id': str, 'answered': int
    }, parse_dates=['attempt_timestamp'], infer_datetime_format=True)

    # Count unique attempts that reached each question_order
    reach = df.groupby(['course_id', 'quiz_id', 'question_order'])['attempt_id'] \
             .nunique() 
    reach = reach.reset_index(name='participants')

    results = []
    for (course, quiz), g in reach.groupby(['course_id', 'quiz_id']):
        g = g.sort_values('question_order')
        participants = g['participants'].to_list()
        if not participants:
            continue
        start = participants[0]
        curve = [p / start for p in participants]

        drop_idx = None
        drop_abs = None
        drop_rel = None
        for i in range(1, len(participants)):
            prev, cur = participants[i-1], participants[i]
            abs_drop = (prev - cur) / start
            rel = cur / prev if prev > 0 else 0.0
            if abs_drop >= abs_threshold or rel <= rel_threshold:
                drop_idx = int(g['question_order'].iloc[i])
                drop_abs = round(abs_drop, 3)
                drop_rel = round(rel, 3)
                break

        results.append({
            'course_id': course,
            'quiz_id': quiz,
            'start_participants': int(start),
            'drop_question_order': drop_idx,
            'drop_question_id': g.loc[g['question_order'] == drop_idx, 'question_id'].squeeze() if drop_idx is not None else None,
            'drop_absolute_pct': drop_abs,
            'drop_relative': drop_rel,
            'participation_curve': str(curve)
        })

    out = pd.DataFrame(results)
    out_path = Path(csv_path).with_name(Path(csv_path).stem + '-drop-points.csv')
    out.to_csv(out_path, index=False)
    return str(out_path)

# Example local call
if __name__ == '__main__':
    print(main('exports/course-ENG101-quiz-results.csv'))

How Functory Makes It Easy

On Functory this core logic is wrapped in a single Python main(...) function whose parameters (strings, numbers, or file paths) become inputs in the web UI and JSON API. To publish this as a Functory function you would:

  • Choose an exact Python version (for example, 3.11.11) and set it in the Functory configuration.
  • Declare a requirements.txt where each dependency is pinned to an exact version (e.g., pandas==2.1.1).
  • Place the function in a single file with def main(csv_path: str, abs_threshold: float, rel_threshold: float) -> str, so Functory can call it directly.

Inputs/outputs via the API: the CSV can be uploaded as a File field or addressed by a URL string. The function returns a path-like string that Functory exposes as a downloadable file. You can trigger the function from the web UI, programmatically from another backend, or from an LLM agent that posts the JSON payload. Benefits: no servers to manage, automatic cloud execution on CPU tiers, autoscaling, built-in logging via print(), and pay-per-use billing handled by Functory. You can chain this function with another Functory function for visualization (pre-processing → drop-point detection → automated PDF report).

Alternatives and why this approach is better

Teams often use manual spreadsheets, LMS dashboards, or ad-hoc Jupyter notebooks to find drop-offs. Spreadsheets are error-prone and hard to reproduce; dashboards often lack per-attempt detail; notebooks are not easily callable from other systems. Wrapping the logic in a function that accepts CSVs and returns standardized CSVs or JSON provides reproducibility, API access, and easy integration into existing automation pipelines without running your own servers.

Business impact

Converting a manual Excel process into a repeatable function reduces manual processing time by ~60% for an agency that processes weekly exports (example: saving 4–6 staff-hours per week) and reduces human error in accreditation reports. It also shortens the audit trail by producing a consistent CSV per run, which speeds auditors' verification process by an estimated 30%.

According to a 2024 EDUCAUSE-style report, ~68% of higher-education institutions require module-level interaction metrics for accreditation reviews (source: EDUCAUSE 2024 institutional analytics survey).

Conclusion: A compact function that computes drop-off points from LMS exports provides a reproducible, auditable metric for accreditation and operations reporting. Next steps: try the example on a sample export, tune absolute/relative thresholds for your courses (e.g., 0.15/0.4), and publish the main(...) to Functory with pinned dependencies (e.g., Python 3.11.11 and pandas==2.1.1). Experiment with chaining this function to an auto-generated PDF reporter or a BI ingestion step and publish your findings for stakeholders.

Thanks for reading.