GitQuick Metrics Documentation

GitQuick analyzes GitHub Pull Request metadata to surface code review performance metrics. This site documents every metric that appears in the GitQuick UI and the exact calculation method used to produce it.

Contents

How data is collected

For each analyzed repository, GitQuick fetches Pull Requests and their associated reviews, commits, and timestamps via the GitHub REST and GraphQL APIs. The raw records are persisted in Postgres, and metrics are computed on demand from that stored dataset when an org run's results are viewed.

Scope of an "org run"

A metric is always computed over the set of PRs included in a single org run. Each PR contributes at most once to each metric. PRs created or merged outside the configured time window of the run are excluded.

Percentiles

Wherever "p50" (median) or "p90" are reported, GitQuick uses discrete percentile semantics (percentile_disc). For a sorted sample, the p‑th percentile is the smallest value such that at least p% of the sample is at or below it. Values of null for percentiles indicate that the sample was empty or all values were invalid.

Sample sizes

Every time‑based and size‑based metric is accompanied by a sample_size, which is the number of PRs that contributed a valid value to that metric (after excluding null, negative, or otherwise invalid records). A metric with a low sample size should be interpreted with care.


Signals Guide

The Signals tab is the fastest way to understand what is going wrong in your code review process. Instead of reading every chart and number on the Metrics tab, GitQuick runs your data through six automated checks and surfaces only the problems that matter — with severity, context, and suggested next steps.

This guide explains what Signals are, how to use them, and exactly how each one is calculated.


What are Signals?

Signals are interpreted findings, not raw numbers.

GitQuick first computes metrics from your GitHub pull request data (review times, throughput, PR size, and so on). The Signals engine then compares those metrics against healthy thresholds and flags patterns that usually indicate a real process problem.

Think of it this way:

Tab What you get
Signals "Here is what looks unhealthy, how serious it is, and what to try next."
Metrics "Here are the raw numbers that back it up."

Signals are deterministic: the same run data always produces the same signals. There is no AI or randomness involved in detection — only fixed rules and thresholds.


How to use the Signals tab

1. Open a completed org run

Signals appear after GitQuick finishes analyzing your repositories. Select a run from your history, then click the Signals tab (it is the first tab by default).

2. Read the cards top to bottom

Only triggered signals are shown. They are sorted by:

  1. Severity — Critical first, then High, Medium, Low
  2. Confidence — Higher-confidence signals appear first within the same severity

If nothing is wrong, you will see:

✓ No signals detected in this run — all headline metrics are within healthy thresholds.

That is a good outcome. It means none of the six checks found a pattern above its warning threshold.

3. Expand a card for the full story

Each signal card has a collapsed header and an expandable body.

Collapsed header shows:

Expanded body shows:

4. Drill into the Metrics tab

Use Supporting metrics links (e.g. "Time to First Review (median, p90) ↓") to see the underlying data. GitQuick switches to the Metrics tab and scrolls to the matching chart, briefly highlighting it.

5. Change scope when needed

Use the scope selector (top right) to view signals for:

Signals recalculate for the selected scope. A problem visible at org level may disappear when you zoom into one repo — or the opposite.

6. Compare runs over time

Signals reflect a single run. To see whether a problem is new or getting worse, use the Trend tab alongside Signals. A signal that appears in run after run is worth prioritizing.


How Signals are calculated (overview)

For each org run, GitQuick:

  1. Computes all metrics from stored PR data (see Metrics Reference)
  2. Runs six signal definitions against those metrics
  3. Evaluates multiple rules per signal (each rule checks one condition)
  4. Assigns severity and confidence
  5. Returns only signals whose status is triggered

A signal is triggered when at least one of its rules fires. If every rule passes (or cannot be evaluated because data is missing), the signal is considered healthy and is not shown.

Severity

Each triggered rule contributes a severity level. The signal's overall severity is the highest severity among its triggered rules.

Some signals have compound rules — conditions that only fire when multiple problems happen together (e.g. both review latency and approval latency are elevated). When a compound rule triggers alongside other rules, severity is escalated by one level (Low → Medium → High → Critical, capped at Critical).

Within a single rule, severity scales with how far the observed value exceeds the threshold:

How far above the warning threshold Severity
Just above threshold Low
Roughly one-third of the way to critical Medium
Approaching critical High
At or above critical threshold Critical

Confidence

Confidence (shown as 0–100%) reflects how much you should trust the finding. It combines three factors:

Factor Weight Meaning
Data completeness 40% Were all rules for this signal evaluable, or were some metrics missing?
Corroboration 35% How many rules triggered vs. how many were checked? More corroborating rules = higher confidence.
Sample size 25% Is there enough PR data? Signals based on very few PRs get lower confidence.

Sample size is compared against a minimum of 10 PRs. Below that, confidence is penalized proportionally.

What you will not see


The six signals

GitQuick evaluates exactly six signals on every run. Each one maps to a common class of engineering delivery problem.


1. Review Latency Problem

What it detects: Pull requests are waiting too long for someone to review them or approve them.

Why it matters: Slow reviews stall delivery, let context go stale, and pull reviewers back to old work.

Rules and thresholds

Rule Condition Warning threshold Critical threshold
First review median Median time to first review > 8 hours ≥ 24 hours
Approval median Median time to approval > 24 hours ≥ 72 hours
First review tail ratio p90 ÷ median for time to first review > 3.0× ≥ 6.0×
Both review and approval elevated (compound) Both medians above their thresholds Escalates severity +1

Tail ratio explained: A high p90-to-median ratio means a few PRs are stuck for much longer than typical. The median can look fine while a subset of PRs waits days.

Evidence shown: First review median/p90, approval median/p90 (in hours).

Sample size used for confidence: Number of PRs with valid time-to-first-review data.


2. Merge Pipeline Friction

What it detects: PRs are approved but not merging quickly. The bottleneck is after review — not during it.

Why it matters: Approved work sitting idle increases merge conflicts, stale CI, and deployment delays even when review throughput looks healthy.

Rules and thresholds

Rule Condition Warning threshold Critical threshold
Approval-to-merge median Median time from last approval to merge > 4 hours ≥ 16 hours
Dominates cycle (compound) Approval-to-merge ÷ total time-to-merge > 40% of cycle Escalates severity +1
Approval-to-merge tail ratio p90 ÷ median for approval-to-merge > 3.0× ≥ 6.0×

Dominance fraction explained: If 60% of a PR's total lifetime is spent waiting after approval, review is not the problem — merge gating, CI, or release process likely is.

Evidence shown: Approval-to-merge median/p90, time-to-merge median, approval-to-merge as a fraction of total cycle.

Sample size used for confidence: Number of merged PRs with valid approval-to-merge data.


3. Backlog / Throughput Imbalance

What it detects: More PRs are being opened each week than are being merged. The review queue is growing.

Why it matters: Even if individual PR cycle times look acceptable, a growing backlog means older PRs go stale, conflicts mount, and reviewer attention fragments.

Rules and thresholds

The core metric is merge efficiency:

merge efficiency = avg PRs opened per week ÷ avg PRs merged per week
Value Meaning
1.0 Intake matches output — backlog is stable
1.2 20% more opened than merged — backlog is growing
2.0+ Intake is double output — backlog is growing fast
Rule Condition Warning threshold Critical threshold
Merge efficiency above threshold merge efficiency > 1.2 ≥ 2.0
Merge efficiency critical merge efficiency > 2.0 (always Critical)

Evidence shown: Avg opened per week, avg merged per week, merge efficiency ratio.

Sample size used for confidence: Total PRs opened in the run.


4. Review Quality Risk

What it detects: Human review may be bypassed or diluted — the safety net is weaker than raw review counts suggest.

Why it matters: Code reaching main without human eyes increases regression risk and weakens knowledge sharing.

Rules and thresholds

Rule Condition Warning threshold Critical threshold
Merged without review % of merged PRs with zero reviews > 10% ≥ 30%
Bot review share % of all reviews from bot accounts > 50% ≥ 80%
Both bypass and bot elevated (compound) Both conditions triggered Escalates severity +1

Bot classification: GitQuick identifies bots by login patterns (e.g. [bot] suffix, dependabot-style names). See Metrics Reference — Bot vs Human Reviews.

Evidence shown: Merged-without-review percentage, bot review percentage.

Sample size used for confidence: Total merged PRs in the run.


5. PR Size / Complexity Risk

What it detects: Pull requests are too large for effective human review.

Why it matters: Research suggests reviewers struggle to give thorough feedback above ~200–400 changed lines. Oversized PRs tend to get rubber-stamped, hide bugs, and slow the whole pipeline.

PR size uses total lines (additions + deletions) per PR.

Rules and thresholds

Rule Condition Warning threshold Critical threshold
Median size Median total lines per PR > 400 lines ≥ 1,000 lines
p90 size 90th percentile total lines > 1,000 lines ≥ 2,500 lines
Median and p90 both elevated (compound) Both size rules triggered Escalates severity +1

When both median and p90 are elevated, oversized PRs are the norm — not just a few outliers.

Evidence shown: PR size median and p90 (total lines).

Sample size used for confidence: Number of PRs with valid size data.


6. Development Cycle Delay

What it detects: Engineers are keeping work on local branches too long before opening a PR for review.

Why it matters: Long local incubation correlates with larger PRs, harder reviews, more rebasing, and delayed feedback from CI and teammates.

Rules and thresholds

Uses average hours from first commit to PR open (not median).

Rule Condition Warning threshold Critical threshold
First commit to open Avg hours from first commit to PR open > 24 hours ≥ 72 hours
First commit to open critical Same metric > 72 hours (always Critical)

Evidence shown: Avg first-commit-to-open hours, sample size for first-commit metrics.

Sample size used for confidence: Number of PRs with first-commit data.


Threshold reference (quick lookup)

All default thresholds in one place:

Signal Metric Warning Critical
Review Latency First review median (hours) > 8 ≥ 24
Review Latency Approval median (hours) > 24 ≥ 72
Review Latency First review p90/median ratio > 3.0 ≥ 6.0
Merge Friction Approval-to-merge median (hours) > 4 ≥ 16
Merge Friction Approval-to-merge share of cycle > 40%
Merge Friction Approval-to-merge p90/median ratio > 3.0 ≥ 6.0
Backlog Imbalance Opened ÷ merged per week > 1.2 ≥ 2.0
Review Quality Merged without review (%) > 10 ≥ 30
Review Quality Bot review share (%) > 50 ≥ 80
PR Size Total lines median > 400 ≥ 1,000
PR Size Total lines p90 > 1,000 ≥ 2,500
Dev Cycle Delay Avg first commit → open (hours) > 24 ≥ 72

These are industry-reasonable starting points, not calibrated to your specific team. A mature team with strict review SLAs may want stricter thresholds; a small team shipping infrequently may legitimately trigger fewer signals.


Practical tips for new users

Start with severity, then read confidence. A Critical signal with 90% confidence deserves immediate attention. A Low signal with 40% confidence may reflect a small sample — verify on the Metrics tab before changing process.

One signal often explains another. PR Size Risk frequently drives Review Latency. Dev Cycle Delay often precedes PR Size Risk. When multiple signals fire, look for a root cause rather than treating each in isolation.

Use scope to find where the problem lives. Org-level Review Latency with no signal at repo level means the issue is concentrated in specific repositories — check Repos Performance or narrow scope.

Empty Signals ≠ perfect engineering. Thresholds are conservative. You can have meaningful improvement opportunities that do not yet trigger a signal. The Metrics and Trend tabs still add value.

Signals complement AI reports. When AI Insights are enabled, executive reports incorporate triggered signals into their narrative. Signals give you the structured detection; AI reports add broader context and wording for stakeholders.


Related documentation


Metrics Reference

All metrics below are computed from the set of Pull Requests included in a given org run. "PR" means a single GitHub Pull Request record.

Percentiles. Median (p50) and p90 are discrete percentiles: they always correspond to an actual PR in the sample. GitQuick computes some rollups in one pipeline and the primary-language breakdown in another; rounding and how durations are grouped can make median/p90 differ slightly between the main run view and the per-language view for the same underlying PRs.


1. Time to First Review

What it measures: how long a PR waits before any human or bot posts its first review.

Source fields: pr_created_at, first_review_at.

Per-PR value:

time_to_first_review_ms = first_review_at − pr_created_at

A PR contributes to the sample only when both timestamps exist and the difference is non‑negative.

Reported statistics:


2. Time to Approval

What it measures: how long from PR open until the first approving review.

Source fields: pr_created_at, first_approval_at.

Per-PR value:

time_to_approval_ms = first_approval_at − pr_created_at

A PR contributes only when both timestamps exist and the difference is non‑negative.

Reported statistics: sample_size, median_hours, p90_hours, spread.


3. Approval‑to‑Merge Time

What it measures: how long a PR sits between its last approval and actually being merged. This isolates "waiting to merge" latency from review latency.

Source fields: last_approval_at, merged_at.

Per-PR value:

approval_to_merge_ms = merged_at − last_approval_at

Only merged PRs with a recorded approval contribute. PRs where merged_at < last_approval_at (force-merges or data anomalies) are excluded.

Reported statistics: sample_size, median_hours, p90_hours, spread.


4. Time to Merge

What it measures: total wall-clock time from PR open to merge.

Source fields: pr_created_at, merged_at.

Per-PR value:

time_to_merge_ms = merged_at − pr_created_at

Only merged PRs with a non-negative difference contribute.

Reported statistics: sample_size, median_hours, p90_hours, spread.


5. PR Throughput

What it measures: opened and merged PR volume per week, over the run window.

Main run view:

  1. For each PR, assign an opened week from pr_created_at and a merged week from merged_at when merged. Weeks are Sunday-aligned calendar buckets (local date normalization, then a single date label per week).
  2. Count PRs per week for opened and merged.
  3. Take the set of weeks that have at least one opened or merged PR (weeks_analyzed).
  4. Compute simple averages:
avg_opened_per_week = sum(weekly_opened) / weeks_analyzed
avg_merged_per_week = sum(weekly_merged) / weeks_analyzed

Weeks with zero PRs still count toward the denominator only if they appear in one of the weekly series. Weeks with no activity in both series are not counted.

Primary-language breakdown assigns weeks using a Monday-aligned week boundary. Throughput averages by language therefore do not use the same week buckets as the main run totals for identical timestamps.


6. Merged Without Review

What it measures: fraction of merged PRs that were merged without any recorded review.

Calculation:

merged_without_review.count      = count(pr where merged_at IS NOT NULL AND review_count = 0)
merged_without_review.percentage = 100 × count / total_merged

total_merged is the number of PRs in the run that have a merged_at.


7. Reviewer Participation

What it measures: how many distinct reviewers typically engage on each PR.

Calculation:

avg_reviewers_per_pr = mean(distinct_reviewers where distinct_reviewers > 0)

The primary-language breakdown also reports median_reviewers_per_pr (p50) and prs_with_reviewers (count where distinct_reviewers > 0).


8. Reviewer Load

What it measures: how review work is distributed across reviewers.

Calculation: for each distinct reviewer login in the run, count the number of reviews they authored. Reviewers are shown as reviewer login plus count, sorted descending by count. Bot accounts are included as-is (see Bot vs Human Reviews to split them).


9. Review Rounds

What it measures: breakdown of review activity by review state.

Source fields (per PR): approved_count, changes_requested_count, commented_count.

Calculation:

approved          = Σ approved_count
changes_requested = Σ changes_requested_count
commented         = Σ commented_count
total             = approved + changes_requested + commented
avg_per_pr        = total / pr_count

pr_count here is the total number of PRs in the run (not only PRs with reviews).


10. Re-review Rate

What it measures: among PRs that received at least one changes-requested review, how often there is some approving review after the first changes-requested review (any approver; not necessarily the same person who requested changes).

Calculation:

prs_with_changes_requested = count(pr where changes_requested_count > 0)
prs_with_rereview_approval = count(pr where changes_requested_count > 0 AND has_rereview_approval = true)
rereview_rate              = 100 × prs_with_rereview_approval / prs_with_changes_requested

has_rereview_approval is true when an APPROVED review exists with submitted_at strictly after the earliest CHANGES_REQUESTED review on that PR. null is returned if no PRs had changes requested.


11. PR Size

What it measures: code churn characteristics of PRs in the run.

Source fields (per PR): additions, deletions, changed_files.

Calculation: for each of the three fields independently, compute p50 and p90 over non-null values, plus:

total_churn = sum of all non-null additions + sum of all non-null deletions
sample_size = count(pr where additions IS NOT NULL)

12. First-Commit Metrics

What it measures: how long code lives locally before being opened as a PR, and the relationship between "coding time" and "review/merge time".

Source fields (per PR): time_from_first_commit_to_open_ms, time_from_first_commit_to_merge_ms, commit_count (from GitHub commit metadata when the run is collected).

When data is first stored, durations are computed as:

time_from_first_commit_to_open_ms  = max(0, pr_created_at − first_commit_at)   when first_commit_at exists
time_from_first_commit_to_merge_ms = max(0, merged_at − first_commit_at)      when both exist

Downstream metrics use these stored values. Negative intervals are clamped to zero when stored, rather than dropped from aggregates.

Per-PR ratio:

ratio_open_over_merge = time_from_first_commit_to_open_ms / time_from_first_commit_to_merge_ms

ratio_open_over_merge is only included when both values exist and time_from_first_commit_to_merge_ms > 0. It expresses the share of a PR's total lifetime that was spent before opening the PR — a value near 1.0 means most of the time was coding, near 0.0 means most of the time was review.

Reported statistics:

Note: these are means (not medians). Outliers will influence them.


13. Bot vs Human Reviews

What it measures: share of review activity attributed to bot-like reviewer logins versus the rest, based on login patterns (not GitHub account-type metadata from the API).

Calculation: reviewer logins are classified using heuristics such as a [bot] suffix, dependabot-style names, and patterns like -bot at the end or -bot- in the string. Review counts from the reviewer load breakdown are summed into bot vs human totals and shown as counts and percentages.


14. Merge predictability (tail risk)

What it measures: how heavy the slow tail of merge times is relative to the median, combined with how wide the p90–p50 gap is. The product surfaces this as merge predictability / long-tail risk (not a simple ratio of p90 to median).

Calculation (using p50 and p90 of time-to-merge, in hours):

gap_component   = 1 − exp(−(p90 − p50) / τ)
shape_component = 1 − p50 / p90
tail_score      = clamp(gap_component × shape_component, 0, ~1)

Default τ = 168 hours (seven days), in the same units as p90 − p50. If data are missing or p50 ≤ 0, p90 ≤ 0, or p90 < p50, no score is shown.

The score is mapped to semantic bands (for example healthy through systemic long-tail risk) for display alongside throughput.


15. Primary-Language Breakdown

Repositories are grouped by primary language from each repository’s language metadata (dominant language on the default branch as reported by GitHub).

Per-language metrics use the same definitions as above, but because week boundaries and percentile pipelines differ from the main run view (see §1 and §5), figures are not guaranteed to match a manual recomputation from the org-wide numbers.


Caveats and known limitations


If something here disagrees with what you see in the product, trust the app and let us know so this reference can be corrected.