Prepare for the Society of Actuaries (SOA) PA Exam with our comprehensive quiz. Study with flashcards and multiple choice questions with explanations. Master key concepts and boost your confidence!

Each practice test/flash card set has 50 randomly selected questions from a bank of over 500. You'll get a new set of questions each time!

Practice this question and more.


What is a major disadvantage of hierarchical clustering?

  1. It provides clear, pre-defined cluster numbers

  2. It can handle different types of data efficiently

  3. It requires a distance matrix which can be computationally expensive

  4. It is easy to interpret results directly

The correct answer is: It requires a distance matrix which can be computationally expensive

Hierarchical clustering provides a method for grouping data points based on their distance from one another, generating a tree-like structure known as a dendrogram. A significant disadvantage of hierarchical clustering is its requirement for a distance matrix, especially when used with larger datasets. This matrix needs to be calculated to determine the distances between every pair of data points. The generation of this distance matrix can become computationally expensive as the number of data points increases. Specifically, if there are n data points, the distance matrix requires O(n^2) computation time and storage. This can be quite resource-intensive both in terms of processing power and memory usage, which is a notable limitation for hierarchical clustering methods. Additionally, other options, while they may have merits, do not accurately reflect the main disadvantages of hierarchical clustering. For instance, the ability to define a clear number of clusters is not typically a feature of hierarchical clustering, as it often provides a range of clusters, which can lead to subjective interpretation when deciding the optimal number. Similarly, while hierarchical clustering can manage various data types, this is not inherently tied to its major disadvantages; and the interpretability of results can sometimes be subjective depending on the complexity of the dataset and the structure of the dendrogram.