Compute Lead Time to Merge from GitHub CSV in Python (single-file Functory API)
This article shows how to build a focused, single-file Python utility that reads a GitHub issues CSV export and computes lead time to merge per ticket and aggregated metrics for engineering dashboards. You will get concrete input and output schemas, a small pandas-based implementation you can run locally, and guidance for publishing the function as a Functory API so founders and product managers get automated weekly reports.
Long-tail search phrases covered: python compute lead time to merge, lead time to merge from GitHub CSV, single-file Python script Functory.
What this function expects and produces
Input: a CSV file exported from GitHub issues or a combined issues+pull requests export. The script expects the following columns (exact names matter):
issue_id(string or int)title(string)created_at(ISO8601 string, e.g. 2024-02-10T13:45:00Z)closed_at(nullable ISO8601 string)pull_request_merged_at(nullable ISO8601 string; if multiple PRs exist, use the merge timestamp of the PR that closed the issue)labels(semicolon-separated, optional)assignee(optional)
Transformations performed:
- Parse timestamps to timezone-aware datetimes.
- Compute per-issue lead time to merge = (pull_request_merged_at - created_at) in hours or days.
- Filter tickets without a merge timestamp (optionally include closed-without-merge statistics).
- Aggregate metrics by week, assignee, or label: mean, median, 90th percentile, and count.
- Output a per-issue CSV and a JSON summary with aggregate numbers.
Output examples:
- CSV:
lead-time-per-issue.csvwith columns: issue_id, created_at, merged_at, lead_time_hours, labels, assignee - JSON summary: {"week_start":"2024-02-05","count":42,"median_hours":18.4,"p90_hours":72.6,"mean_hours":34.7}
Real-world scenario: weekly dashboard for a startup CTO
Imagine a startup with 1,000 tickets in the last quarter. The CSV export filename is github_issues_q1_2025.csv (1000 rows). Columns include: issue_id, title, created_at, closed_at, pull_request_merged_at, labels, assignee. The CTO wants: weekly median lead time to merge, p90 lead time, and top 3 slowest labels for the last 12 weeks so they can spot process bottlenecks.
Concrete input example row:
- issue_id: 4532
- title: 'Fix race condition in event loop'
- created_at: '2025-02-03T09:12:35Z'
- pull_request_merged_at: '2025-02-05T14:02:12Z'
- labels: 'bug;backend'
- assignee: 'alice'
Computed output row:
- lead_time_hours: 52.82
Example dataset and the exact problem solved
Dataset: 1000 rows of GitHub issue/PR linkage with the schema above. Size: ~120 KB compressed, typical for mid-size projects. Problem: product and engineering stakeholders manually copy-paste timestamps into spreadsheets and compute week-level medians by hand. This function automates parsing, merges multi-format dates, calculates per-issue lead times robustly (handles missing merged times), and emits weekly aggregates suitable for dashboards (e.g., Metabase or a JSON API).
Step-by-step mini workflow
- Export issues from GitHub into
github_issues_export.csv. - Run the single-file script to produce
lead-time-per-issue.csvandsummary.json. - Push
summary.jsoninto a dashboard backend or schedule the Functory function to run weekly. - Alert the CTO if median or p90 exceed thresholds (e.g., median > 48 hours).
Algorithm overview
- Read CSV with pandas, parse dates using pandas.to_datetime with utc=True.
- For each row, if pull_request_merged_at is present, compute delta = merged_at - created_at in hours.
- Drop rows where delta is negative or missing, but keep counts of excluded rows.
- Group by week_start = created_at.dt.to_period('W').start_time and compute count, mean, median, 90th percentile.
- Serialize per-issue CSV and a JSON with weekly aggregates and metadata.
Python example
The following code is a minimal, runnable example that computes lead times and writes outputs. It uses pandas and numpy.
import pandas as pd
import numpy as np
from pathlib import Path
def compute_lead_time(csv_path: str, output_csv: str = 'lead-time-per-issue.csv') -> dict:
df = pd.read_csv(csv_path, dtype={'issue_id': str})
# Parse timestamps robustly
for col in ['created_at', 'pull_request_merged_at', 'closed_at']:
if col in df.columns:
df[col] = pd.to_datetime(df[col], utc=True, errors='coerce')
# Compute lead time in hours
df['lead_time_hours'] = (df['pull_request_merged_at'] - df['created_at']).dt.total_seconds() / 3600
# Keep only valid lead times
valid = df['lead_time_hours'].dropna()
# Per-issue CSV
out_cols = ['issue_id', 'title', 'created_at', 'pull_request_merged_at', 'lead_time_hours', 'labels', 'assignee']
df_out = df.loc[df['lead_time_hours'].notna(), [c for c in out_cols if c in df.columns]]
df_out.to_csv(output_csv, index=False)
# Weekly aggregates
df['week_start'] = df['created_at'].dt.to_period('W').apply(lambda r: r.start_time)
agg = df.loc[df['lead_time_hours'].notna()].groupby('week_start')['lead_time_hours'].agg(
count='count', mean='mean', median='median', p90=lambda x: np.percentile(x, 90)
).reset_index()
summary = {
'total_issues': int(df.shape[0]),
'issues_with_merges': int(valid.shape[0]),
'excluded_due_to_missing_merge': int(df.shape[0] - valid.shape[0]),
'weekly': agg.to_dict(orient='records')
}
return summary
# Example call
if __name__ == '__main__':
summary = compute_lead_time('github_issues_export.csv', 'lead-time-per-issue.csv')
print(summary)
How Functory Makes It Easy
To publish this as a Functory function you wrap the core logic above so that Functory calls a single entrypoint: main(csv_path: FilePath, output_csv: str = 'lead-time-per-issue.csv'). On Functory you must pick an exact Python version, for example 3.11.11, and list pinned dependencies in requirements.txt like:
pandas==2.2.0
numpy==1.26.0
The platform exposes each parameter as an input in the web UI and as JSON fields on the API. If main returns a path-like string (e.g., the output CSV path), Functory exposes that file as a downloadable result. You get automatic cloud execution, autoscaling, and print() logs captured in the Functory UI, and you pay per execution. The function can be triggered from the Functory web UI, a scheduled job, or programmatically from a backend or an LLM agent that calls the HTTP API. You can chain functions by having this step produce summary.json, then call another Functory function to generate a Slack alert or update a dashboard.
Alternative approaches and why a function is better
Teams commonly compute lead time manually in spreadsheets, run ad-hoc Jupyter notebooks, or use heavyweight analytics platforms. Manual spreadsheets are error-prone (copy/paste mistakes) and hard to automate. Notebooks are great for exploration but poorly suited to scheduled, permissioned APIs. A single-file function that runs on Functory combines the reproducibility and auditability of a small script with the operational benefits of a hosted API: scheduled runs, per-execution logs, and a file artifact endpoint. Compared to a spreadsheet, this reduces human error and enforces a consistent schema; compared to a full ETL pipeline, it is cheaper to operate and faster to iterate.
Business impact
Concrete benefit: automating CSV parsing and weekly aggregation can reduce manual processing time by ~70% for an engineering manager who used to spend 4–6 hours per week generating reports; the function runs in seconds on CPU and can be scheduled. It also provides earlier detection of regressions in process (e.g., median lead time rising above a 48-hour alert threshold), which can reduce late-feature rollouts by an estimated 15% in a typical six-month product cycle.
Industry stat: According to the 2023 State of DevOps report, lead time for changes is one of four key DORA metrics and top-performing teams have median lead times under one day (source: State of DevOps 2023 summary).
Comparison to other tools
Dedicated analytics tools (Looker, Amplitude) can compute similar metrics but require schema mapping and ongoing costs. GitHub Insights and third-party SaaS give quick dashboards but often hide details about data transformations. A scripted approach keeps full control over the logic, column mapping, and thresholds. Publishing as a Functory function gives the immediate operational advantage of an API without building and maintaining a server or CI, and with explicit dependency pinning for reproducibility.
Conclusion: Computing lead time to merge from GitHub CSVs is a small transformation with high business value: it turns raw timestamps into actionable engineering metrics. Next steps: try the example locally on a small export, then wrap the core logic in a main(...) for Functory with a pinned requirements.txt (for example Python 3.11.11 and pandas==2.2.0). After that, schedule weekly runs and connect the JSON output to your dashboard or alerting pipeline. Publish the function for teammates so they can call it without touching code or servers.
Thanks for reading.
