This article mainly introduces a commonly used evaluation metric in Search and Recommend: NDCG.
NDCG (Normalized Discounted Cumulative Gain) can be translated as normalized discounted cumulative gain, which is usually used to measure and evaluate the accuracy of ranking. For example, in an e-commerce system, a user's query returns a list of products. Assuming the list length is N, NDCG@N can be used to evaluate the difference between the ranking of this list and the actual interaction list.
Cumulative Gain#
CG (Cumulative Gain) is the sum of the graded relevance scores of all documents in the search result list. CG only considers the relevance of the results in the search result list and does not take into account the positional factors of these documents in the result list. Given the ranking position of a result list, it can be defined as:
where represents the relevance at position .
For example, if a user searches for "mobile phone" on an e-commerce platform, the top 6 results are: iPhone, Xiaomi, Huawei, OPPO, VIVO, Samsung. If the user prefers Huawei, then regardless of which position Huawei occupies in the top 6, it is equivalent for Cumulative Gain. However, for search and recommendation systems, the best result is for the Huawei phone to be in the first position, leading to the next metric, Discounted Cumulative Gain.
Discounted Cumulative Gain#
DCG (Discounted Cumulative Gain) proposes that when highly relevant results appear in the later positions of the search result list, a penalty should be applied to the evaluation score. The penalty ratio is related to the logarithmic value of the document's position. Given the ranking position of a result list, DCG can be defined as:
:::div{style="text-align: center}
:::
There is another commonly used formula that increases the weight of relevance impact:
:::div{style="text-align: center}
:::
The latter formula is more commonly used in most search companies and e-commerce platforms.
When the relevance score , meaning it only takes values of 0 and 1, the two formulas above are equivalent.
Normalized Discounted Cumulative Gain#
NDCG (Normalized Discounted Cumulative Gain), returning to the NDCG mentioned at the beginning of the article. NDCG believes that the length of search results varies with the query, and using only DCG to compare the performance of different queries is inconsistent. Therefore, for the chosen value of , the CG at each position should be normalized based on the query, first sorting all relevant results in the database by relevance, and then generating the maximum possible DCG through position , usually referred to as IDCG (Ideal DCG). For a query, NDCG can be calculated as follows:
:::div{style="text-align: center}
:::
where IDCG is the DCG in an ideal state, calculated as:
:::div{style="text-align: center}
:::
represents the list of the top results with the highest relevance in the database (sorted by relevance).
The average NDCG value for all queries is taken to assess the average performance of the search engine ranking algorithm. The larger the NDCG value, the better the performance of the ranking algorithm. In a perfect ranking algorithm, and should be equal, at which point the NDCG value is 1. The minimum NDCG value is 0, so the NDCG values are distributed between 0 and 1. It is a relative value, allowing for comparisons across queries.
Calculation Example#
Taking the previously mentioned user search query "mobile phone" and returning 6 results as an example, with results: iPhone, Xiaomi, Huawei, OPPO, VIVO, Samsung, corresponding to relevance scores of 3, 2, 3, 0, 1, 2.
Then, the corresponding CG is:
:::div{style="text-align: center}
:::
Changing the order of any two results will not affect the final result of the CG calculation. DCG is used to emphasize highly relevant results that appear earlier in the result list. Using logarithmic reduction, the DCG for each result is as follows:
1 | 3 | 7 | 1 | 3 | 7 |
2 | 2 | 3 | 1.585 | 1.262 | 1.893 |
3 | 3 | 7 | 2 | 1.5 | 3.5 |
4 | 0 | 0 | 2.322 | 0 | 0 |
5 | 1 | 1 | 2.585 | 0.387 | 2.585 |
6 | 2 | 3 | 2.807 | 0.712 | 1.069 |
Using the first DCG formula, the calculation is as follows:
:::div{style="text-align: center}
:::
Using the second DCG formula, the calculation is as follows:
:::div{style="text-align: center}
:::
Now, swapping the positions of the 3rd and 4th results will lead to a decrease in DCG. This is because a less relevant result is ranked higher in the order, meaning a more relevant result is placed in a lower position and suffers more discount.
The performance of this query cannot be compared in this case, as another query might have more results, leading to a higher DCG, but it does not necessarily mean that the performance of this query is better. To make comparisons, the DCG values must be normalized.
To normalize the DCG value, an ideal ranking must be performed for the given query. In this example, the ranking will be a monotonically decreasing order of all known relevance judgments. Besides the 6 results in this example, suppose we also know that result 7 has a relevance score of 3, and result 8 has a relevance score of 2. Then the ideal ranking is:
:::div{style="text-align: center}
:::
When there are no results 7 and 8, the ideal ranking is
:::div{style="text-align: center}
:::
The DCG of the ideal ranking, also known as IDCG, is calculated for the top 6 positions:
:::div{style="text-align: center}
:::
Thus, the NDCG for this query can be calculated as:
:::div{style="text-align: center}
:::
Python Code to Calculate NDCG#
import numpy as np
def get_dcg(rec_list):
log_2 = np.log2(np.arange(2, len(rec_list) + 2))
mi = np.power(2, rec_list) - 1
dcg = np.array(mi / log_2).sum()
return dcg
if __name__ == "__main__":
rec_list = [3, 2, 3, 0, 1, 2]
sort_rec_list = [3, 3, 2, 2, 1, 0]
dcg = get_dcg(rec_list)
idcg = get_dcg(sort_rec_list)
print("ndcg = dcg / idcg = ", dcg/idcg)