Michele Scardi
2010-07-10 08:15:47 UTC
A student of mine recently showed me a NMDS ordination of fish
assemblages, which was based on Bray-Curtis dissimilarity computed on
log-transformed data.
I told her that log-transforming data before computing BC did not make
sense to me, because the original interpretation of the BC dissimilarity
(the ratio between the sum of the differences between two samples and
the overall sum of the specimens found) would be lost.
She argued that I was probably right, but she read many papers based on
this approach. As for me, I never noticed so many papers based on
log-transformed data and BC, but I ran a quick bibliographic search and
I was surprised by the number of papers using this approach.
I cannot see why one should log- or sqrt- or sqrt(sqrt)-transform the
data before computing BC, which is meant to measure relative
differences, not quantitative differences. I am afraid the most people
just want to try to normalize data distributions even in case
normalization is not really necessary. And the result of unnecessary
normalization is that the interpretation of distances/dissimilarities
can be much less straightforward than with raw data.
However, I'd really like to read other opinions about this!
All the best,
Michele
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Michele Scardi
Associate Professor of Ecology
Department of Biology
Tor Vergata University
Rome, Italy
http://www.michele.scardi.name
http://www.mare-net.com/mscardi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assemblages, which was based on Bray-Curtis dissimilarity computed on
log-transformed data.
I told her that log-transforming data before computing BC did not make
sense to me, because the original interpretation of the BC dissimilarity
(the ratio between the sum of the differences between two samples and
the overall sum of the specimens found) would be lost.
She argued that I was probably right, but she read many papers based on
this approach. As for me, I never noticed so many papers based on
log-transformed data and BC, but I ran a quick bibliographic search and
I was surprised by the number of papers using this approach.
I cannot see why one should log- or sqrt- or sqrt(sqrt)-transform the
data before computing BC, which is meant to measure relative
differences, not quantitative differences. I am afraid the most people
just want to try to normalize data distributions even in case
normalization is not really necessary. And the result of unnecessary
normalization is that the interpretation of distances/dissimilarities
can be much less straightforward than with raw data.
However, I'd really like to read other opinions about this!
All the best,
Michele
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Michele Scardi
Associate Professor of Ecology
Department of Biology
Tor Vergata University
Rome, Italy
http://www.michele.scardi.name
http://www.mare-net.com/mscardi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~