Unlocking Percentiles with MariaDB's PERCENTILE_CONT Function


What it does

  • Unlike PERCENTILE_DISC, it interpolates between data points if necessary to achieve a continuous percentile value.
  • It identifies the value that corresponds to the given percentile within that sorted data.
  • It analyzes a set of data ordered based on a specific column.
  • It takes a fraction (between 0 and 1) as input.

How it works

  1. Number of Rows (N)
    It first determines the total number of rows in the partition (group) being analyzed.
  2. Relative Position (RN)
    It calculates a relative position (RN) based on the provided fraction (p) and the total number of rows (N) using the formula: RN = p * (N-1).
  3. Floor and Ceiling Row Numbers (FRN, CRN)
    It calculates the floor (FRN) and ceiling (CRN) row numbers based on the relative position (RN).
    • FRN represents the integer part of RN (rounded down).
    • CRN represents the integer part of RN rounded up (essentially, FRN + 1).


Example 1: Finding Quartiles in Sales Data

This code calculates the 1st quartile (25th percentile), median (50th percentile), and 3rd quartile (75th percentile) for sales figures grouped by product.

SELECT
  product_name,
  percentile_cont(0.25) WITHIN GROUP (ORDER BY sales_amount) AS quartile_1,
  percentile_cont(0.50) WITHIN GROUP (ORDER BY sales_amount) AS median,
  percentile_cont(0.75) WITHIN GROUP (ORDER BY sales_amount) AS quartile_3
FROM sales_data
GROUP BY product_name;
  • The results show the product name, 1st quartile, median, and 3rd quartile sales amount for each product.
  • Within each group, it uses PERCENTILE_CONT with different fractions (0.25, 0.5, 0.75) to calculate the quartiles based on the sorted sales_amount.
  • It groups the data by product_name.

Example 2: Finding the 90th Percentile of Exam Scores

This code finds the score that represents the 90th percentile in a table of exam results.

SELECT
  student_name,
  exam_score,
  percentile_cont(0.9) OVER (ORDER BY exam_score) AS percentile_90
FROM exam_results;
  • The result shows the student name, their exam score, and the corresponding 90th percentile score.
  • PERCENTILE_CONT(0.9) finds the score that falls at the 90th percentile position.
  • It uses a window clause with OVER (ORDER BY exam_score) to calculate the percentile for each row based on the entire dataset sorted by exam_score.


PERCENTILE_DISC

  • It's simpler than PERCENTILE_CONT but doesn't interpolate for continuous values.
  • It returns the first value in the ordered set that falls at or above the specified percentile.
  • This is another window function that calculates the discrete percentile.

Example (Finding 70th Percentile)

SELECT
  student_name,
  exam_score,
  percentile_disc(0.7) OVER (ORDER BY exam_score) AS percentile_70_disc
FROM exam_results;

User-defined function (UDF)

  • This would offer more flexibility but requires additional development effort.
  • You can write a custom function in a language like Python or C to implement your own percentile calculation logic.

App Logic

  • This might involve sorting the data and then finding the appropriate position based on the percentile value.
  • For older versions of MariaDB that lack window functions, you can calculate percentiles in your application code.
  • For more complex calculations or older MariaDB versions, consider UDFs or app logic, but be aware of the development overhead.
  • If continuous interpolation for percentiles is crucial, using PERCENTILE_CONT (if available) is preferred.
  • If you need the exact value at a specific percentile (discrete), PERCENTILE_DISC is a good option.