<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OXT blog &#187; STL</title>
	<atom:link href="http://blog.o-x-t.com/tag/stl/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.o-x-t.com</link>
	<description>"People say nothing is impossible, but I do nothing every day." - Whinnie The Pooh</description>
	<lastBuildDate>Wed, 25 Aug 2010 19:18:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Hierarchical clustering using C++</title>
		<link>http://blog.o-x-t.com/2009/01/23/hierarchical_clustering/</link>
		<comments>http://blog.o-x-t.com/2009/01/23/hierarchical_clustering/#comments</comments>
		<pubDate>Fri, 23 Jan 2009 12:46:19 +0000</pubDate>
		<dc:creator>Atuk</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Hierarchical clustering]]></category>
		<category><![CDATA[STL]]></category>

		<guid isPermaLink="false">http://blog.o-x-t.com/?p=155</guid>
		<description><![CDATA[Our aim is to implement hierarchical clustering algorithm with O(N*N*log(N)) complexity using STL. The idea of hierarchical clustering is merging two nearest clusters on every step starting from N one-element clusters. To find two nearest clusters on step K we need (N-K+1)x(N-K+1) matrix of distances between current clusters. We need, in worst case, N times [...]]]></description>
			<content:encoded><![CDATA[<p>Our aim is to implement hierarchical clustering algorithm with O(N*N*log(N)) complexity using STL.</p>
<p>The idea of hierarchical clustering is merging two nearest clusters on every step starting from N one-element clusters. To find two nearest clusters on step K we need (N-K+1)x(N-K+1) matrix of distances between current clusters. We need, in worst case, N times of updating and finding minimum in sequence of NxN, (N-1)x(N-1), (N-2)x(N-2), &#8230;, 1&#215;1 matrix.  So, in usual realization of HC algorithm it should cost you O(N*N*N) operations.</p>
<p>To improve algorithm complexity up to O(N*N*log(N)) we can use vector V of similar to priority queues structures. V[i] should consist of sorted by distance points P(i,j) where distance is distance between clusters i and j. Our similar to priority queues data structure should support: ability to extract minimal element with complexity O(1), ability to delete element with complexity O(log(n)), and ability to add and element with complexity O(log(n)).</p>
<p>Standard STL priority_queue allows us to extract minimum and add element with required time, but it has no ability to find quickly an element, so we can&#8217;t delete elements with complexity O(log(n)). I propose to use set or multiset STL structure instead. Set and multiset uses &#8220;red-black tree&#8221; data structure with O(log(n)) complexity of deletion and insertion. Also, in STL sets all elements stored in sorted order and we can easily access to minimum (maximum) element [using standard begin() or end() methods].</p>
<p>In my realization I used following structure to store distances and cluster&#8217;s indexes:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> distances
<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">double</span> dist<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">int</span> index<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Also I defined appropriate comparing operator so that elements in my set always sorted in right order.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> Cmp<span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">bool</span> operator<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> distances d1, <span style="color: #0000ff;">const</span> distances d2<span style="color: #008000;">&#41;</span> <span style="color: #0000ff;">const</span>
	<span style="color: #008000;">&#123;</span>
		<span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span>d1.<span style="color: #007788;">dist</span> <span style="color: #000080;">==</span> d2.<span style="color: #007788;">dist</span><span style="color: #008000;">&#41;</span>
		<span style="color: #008000;">&#123;</span>
			<span style="color: #0000ff;">return</span> d1.<span style="color: #007788;">index</span> <span style="color: #000040;">&amp;</span>lt<span style="color: #008080;">;</span> d2.<span style="color: #007788;">index</span><span style="color: #008080;">;</span>
		<span style="color: #008000;">&#125;</span>
		<span style="color: #0000ff;">return</span> d1.<span style="color: #007788;">dist</span> <span style="color: #000040;">&amp;</span>lt<span style="color: #008080;">;</span> d2.<span style="color: #007788;">dist</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Then declaration of our vector of sets should look like this one:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span><span style="color: #000040;">&amp;</span>gt<span style="color: #008080;">;</span> P<span style="color: #008080;">;</span></pre></div></div>

<p>Here is source codes with sample input file and some comments:</p>
<p><a href="http://blog.o-x-t.com/wp-content/uploads/2009/01/sources.zip">sources.zip</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.o-x-t.com/2009/01/23/hierarchical_clustering/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
