## Archive for January, 2009

### Hierarchical clustering using C++

Our aim is to implement hierarchical clustering algorithm with O(N*N*log(N)) complexity using STL.

The idea of hierarchical clustering is merging two nearest clusters on every step starting from N one-element clusters. To find two nearest clusters on step K we need (N-K+1)x(N-K+1) matrix of distances between current clusters. We need, in worst case, N times of updating and finding minimum in sequence of NxN, (N-1)x(N-1), (N-2)x(N-2), …, 1×1 matrix. So, in usual realization of HC algorithm it should cost you O(N*N*N) operations.

To improve algorithm complexity up to O(N*N*log(N)) we can use vector V of similar to priority queues structures. V[i] should consist of sorted by distance points P(i,j) where distance is distance between clusters i and j. Our similar to priority queues data structure should support: ability to extract minimal element with complexity O(1), ability to delete element with complexity O(log(n)), and ability to add and element with complexity O(log(n)).

Standard STL priority_queue allows us to extract minimum and add element with required time, but it has no ability to find quickly an element, so we can’t delete elements with complexity O(log(n)). I propose to use set or multiset STL structure instead. Set and multiset uses “red-black tree” data structure with O(log(n)) complexity of deletion and insertion. Also, in STL sets all elements stored in sorted order and we can easily access to minimum (maximum) element [using standard begin() or end() methods].

In my realization I used following structure to store distances and cluster’s indexes:

```struct distances { double dist; int index; };```

Also I defined appropriate comparing operator so that elements in my set always sorted in right order.

```struct Cmp{ bool operator()(const distances d1, const distances d2) const { if(d1.dist == d2.dist) { return d1.index &lt; d2.index; } return d1.dist &lt; d2.dist; } };```

Then declaration of our vector of sets should look like this one:

`std::vector&gt; P;`

Here is source codes with sample input file and some comments:

sources.zip

Friday, January 23rd, 2009

### Problems with SSE2 in OpenCV

When I tried to use some methods from OpenCV library (for example CvResize()) in my MS Visual Studio 2005 project I encountered with errors in cxtypes.h header file near

```#if CV_SSE2 __m128d t = _mm_load_sd( &value ); int i = _mm_cvtsd_si32(t); return i - _mm_movemask_pd(_mm_cmplt_sd(t,_mm_cvtsi32_sd(t,i))); #else ....```

These errors were related with SSE2. The problem is that my AMD Sempron 2200+ family 6 (x86 family 6 model 8 ) processor do not support SSE2 instructions. SSE – Streaming SIMD Extensions 2 are extends on MMX instructions to operate on special registers to increase computational speed in some cases.

To fix these problems you need to set CV_SSE2 variable to 0 in cxcore’s cxtypes.h (near 65 line).
For example like here:

```// #if defined WIN32 &amp;&amp; (!defined WIN64 || defined EM64T) && // (_MSC_VER >= 1400 || defined CV_ICC) // || (defined __SSE2__ && defined __GNUC__ && __GNUC__ >= 4) // #include // #define CV_SSE2 1 // #else #define CV_SSE2 0 // #endif```

Tuesday, January 6th, 2009

• ## Info

Need programming help or consulting? Hire us