Unlocking Percentiles with MariaDB's PERCENTILE_CONT Function
What it does
- Unlike
PERCENTILE_DISC
, it interpolates between data points if necessary to achieve a continuous percentile value. - It identifies the value that corresponds to the given percentile within that sorted data.
- It analyzes a set of data ordered based on a specific column.
- It takes a fraction (between 0 and 1) as input.
How it works
- Number of Rows (N)
It first determines the total number of rows in the partition (group) being analyzed. - Relative Position (RN)
It calculates a relative position (RN) based on the provided fraction (p) and the total number of rows (N) using the formula:RN = p * (N-1)
. - Floor and Ceiling Row Numbers (FRN, CRN)
It calculates the floor (FRN) and ceiling (CRN) row numbers based on the relative position (RN).- FRN represents the integer part of RN (rounded down).
- CRN represents the integer part of RN rounded up (essentially, FRN + 1).
Example 1: Finding Quartiles in Sales Data
This code calculates the 1st quartile (25th percentile), median (50th percentile), and 3rd quartile (75th percentile) for sales figures grouped by product.
SELECT
product_name,
percentile_cont(0.25) WITHIN GROUP (ORDER BY sales_amount) AS quartile_1,
percentile_cont(0.50) WITHIN GROUP (ORDER BY sales_amount) AS median,
percentile_cont(0.75) WITHIN GROUP (ORDER BY sales_amount) AS quartile_3
FROM sales_data
GROUP BY product_name;
- The results show the product name, 1st quartile, median, and 3rd quartile sales amount for each product.
- Within each group, it uses
PERCENTILE_CONT
with different fractions (0.25, 0.5, 0.75) to calculate the quartiles based on the sortedsales_amount
. - It groups the data by
product_name
.
Example 2: Finding the 90th Percentile of Exam Scores
This code finds the score that represents the 90th percentile in a table of exam results.
SELECT
student_name,
exam_score,
percentile_cont(0.9) OVER (ORDER BY exam_score) AS percentile_90
FROM exam_results;
- The result shows the student name, their exam score, and the corresponding 90th percentile score.
PERCENTILE_CONT(0.9)
finds the score that falls at the 90th percentile position.- It uses a window clause with
OVER (ORDER BY exam_score)
to calculate the percentile for each row based on the entire dataset sorted byexam_score
.
PERCENTILE_DISC
- It's simpler than
PERCENTILE_CONT
but doesn't interpolate for continuous values. - It returns the first value in the ordered set that falls at or above the specified percentile.
- This is another window function that calculates the discrete percentile.
Example (Finding 70th Percentile)
SELECT
student_name,
exam_score,
percentile_disc(0.7) OVER (ORDER BY exam_score) AS percentile_70_disc
FROM exam_results;
User-defined function (UDF)
- This would offer more flexibility but requires additional development effort.
- You can write a custom function in a language like Python or C to implement your own percentile calculation logic.
App Logic
- This might involve sorting the data and then finding the appropriate position based on the percentile value.
- For older versions of MariaDB that lack window functions, you can calculate percentiles in your application code.
- For more complex calculations or older MariaDB versions, consider UDFs or app logic, but be aware of the development overhead.
- If continuous interpolation for percentiles is crucial, using
PERCENTILE_CONT
(if available) is preferred. - If you need the exact value at a specific percentile (discrete),
PERCENTILE_DISC
is a good option.