# Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm

@article{Amorim2015FeatureRI, title={Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm}, author={Renato Cordeiro de Amorim}, journal={Journal of Classification}, year={2015}, volume={32}, pages={46-62} }

In this paper we introduce a new hierarchical clustering algorithm called Wardp. Unlike the original Ward, Wardp generates feature weights, which can be seen as feature rescaling factors thanks to the use of the Lp norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters.We validate our method by performing experiments on a total of 75 real-world and synthetic datasets, with and without added features made of uniformly… Expand

#### 44 Citations

A-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation

- Computer Science, Mathematics
- Inf. Sci.
- 2016

An anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge, and a variant of Ward more capable of dealing with noise in data sets, are introduced. Expand

A Clustering-Based Approach to Reduce Feature Redundancy

- Computer Science
- KICSS
- 2013

This paper introduces an unsupervised feature selection method that can be used in the data pre-processing step to reduce the number of redundant features in a data set and finds that this method selects features that produce better cluster recovery, without the need for an extra user-defined parameter. Expand

Feature weighting methods: A review

- Computer Science
- Expert Syst. Appl.
- 2021

A global taxonomy for Feature Weighting methods is proposed by focusing on: the learning approach (supervised or unsupervised), the methodology used to calculate the weights, and the feedback obtained from the ML algorithm when estimating the weights. Expand

A Hybrid Clustering Approach for Bag-of-Words Image Categorization

- Computer Science
- Mathematical Problems in Engineering
- 2019

A hybrid clustering approach that combines improved hierarchical clustering with a K-means algorithm that outperforms the conventional BoW model in terms of categorization and demonstrates the feasibility and effectiveness of the approach. Expand

Ultrametric Fitting by Gradient Descent

- Computer Science, Mathematics
- NeurIPS
- 2019

The proposed framework sheds new light on the way to design a new generation of hierarchical clustering methods by leveraging the simple, yet effective, idea of replacing the ultrametric constraint with a min-max operation injected directly into the cost function. Expand

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

- Computer Science, Mathematics
- J. Classif.
- 2017

rCOSA is a software package interfaced to the R language that extends the original COSA software by adding functions for hierarchical clustering methods, least squares multidimensional scaling, partitional clustering, and data visualization. Expand

2D–EM clustering approach for high-dimensional data through folding feature vectors

- Medicine, Computer Science
- BMC Bioinformatics
- 2017

The design of 2D–EM algorithm enables it to handle a diverse set of challenging biomedical dataset and cluster with higher accuracy than established methods, and build confidence in the methods ability to uncover novel disease subtypes in new datasets. Expand

A novel heuristic algorithm to solve penalized regression-based clustering model

- Computer Science, Mathematics
- Soft Comput.
- 2020

A novel heuristic algorithm is proposed to solve the reformulated model of PRClust, which needs only n × n - 1 / 2 scalar slack variables, which are much less than those of DC-CD and DC-ADMM, and updates them using a simple equation in each iteration of the algorithm. Expand

An improved frequency based agglomerative clustering algorithm for detecting distinct clusters on two dimensional dataset

- Computer Science
- 2017

Experimental result shows that the DAAC is suitable for instinctively identifying the K distinct clusters over the different two dimensional datasets with higher intra thickness and lesser intra separation than existing techniques. Expand

A brief survey of unsupervised agglomerative hierarchical clustering schemes

- 2019

Unsupervised hierarchical clustering process is a mathematical model or exploratory tool aims to provide the easiest way to categorize the distinct groups over the large volume of real time… Expand

#### References

SHOWING 1-10 OF 52 REFERENCES

Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

- Mathematics, Computer Science
- Pattern Recognit.
- 2012

The Minkowski metric based method is experimentally validated on datasets from the UCI Machine Learning Repository and generated sets of Gaussian clusters, and appears to be competitive in comparison with other K-Means based feature weighting algorithms. Expand

Feature Selection as a Preprocessing Step for Hierarchical Clustering

- Computer Science
- ICML
- 1999

Analysis of the particular beneets that feature selection may provide in hierarchical clustering tasks and the power of feature selection methods applied as a prepro-cessing step under the proposed dimensions suggest thatfeature selection as preprocessing only provides limited improvements in the performance task. Expand

Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software

- Mathematics, Computer Science
- J. Classif.
- 2001

A new computer program, OVW, which is available to researchers as freeware, implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a new algorithm for optimal Variable Weighting for K-means partitioning. Expand

Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method

- Mathematics, Computer Science
- J. Classif.
- 2005

A hierarchical clustering method that minimizes a joint between-within measure of distance between clusters, by defining a cluster distance and objective function in terms of Euclidean distance, or any power of Euclidesan distance in the interval (0,2). Expand

A preliminary study of optimal variable weighting in k-means clustering

- Mathematics
- 1990

Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of… Expand

Unsupervised Feature Selection Using Feature Similarity

- Mathematics, Computer Science
- IEEE Trans. Pattern Anal. Mach. Intell.
- 2002

An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast. Expand

Automated variable weighting in k-means type clustering

- Computer Science, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2005

A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed, and the convergency theorem of the new clustered process is given. Expand

Weighting Features for Partition around Medoids Using the Minkowski Metric

- Mathematics, Computer Science
- IDA
- 2012

This paper shows that MW-PAM, particularly when initialized with the Build algorithm (also using the Minkowski metric), is superior to other medoid-based algorithms in terms of both accuracy and identification of irrelevant features. Expand

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

- Computer Science
- J. Classif.
- 2014

The survey work and case studies will be useful for all those involved in developing software for data analysis using Ward’s hierarchical clustering method. Expand

Some methods for classification and analysis of multivariate observations

- Mathematics
- 1967

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give… Expand