Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Feb 26, 2019. It is now read-only.

reddavis/K-Means

Repository files navigation

Attempting to build a fast, memory efficient K-Means program.

gem sources -a http://rubygems.org
sudo gem install k_means
require 'rubygems'
require 'k_means'
data = [[1,1], [1,2], [1,1], [1000, 1000], [500, 500]]
kmeans = KMeans.new(data, :centroids => 2)
kmeans.inspect # Use kmeans.view to get hold of the un-inspected array
=> [[3, 4], [0, 1, 2]]
require 'rubygems'
require 'k_means'
# Your custom centroid needs to have #position and #reposition methods
class CustomCentroid
 attr_accessor :position
 def initialize(position); @position = position; end
 def reposition(nodes, centroid_positions); end
end
custom_centroids = []
2.times { custom_centroids << CustomCentroid.new([1,1]) }
data = [[1,1], [1,2], [1,1], [1000, 1000], [500, 500]]
kmeans = KMeans.new(data, :custom_centroids => custom_centroids)

KMeans uses the Distance Measures Gem (github.com/reddavis/Distance-Measures) so we get quite a range of distance measurements.

The measurements currently available are:

euclidean_distance
cosine_similarity
jaccard_index
jaccard_distance
binary_jaccard_index
binary_jaccard_distance
tanimoto_coefficient

To specify a particular one to use in the KMeans algorithm, just provide it as an option:

KMeans.new(@data, :distance_measure => :jaccard_index)
KMeans.new(@data, :distance_measure => :cosine_similarity)
KMeans.new(@data, :distance_measure => :tanimoto_coefficient)

You get the idea...

# 1000 records with 50 dimensions
data = Array.new(1000) {Array.new(50) {rand(10)}}
ai4r_data = Ai4r::Data::DataSet.new(:data_items=> data)
# Clustering can happen in magical ways
# so lets do it over multiple times
n = 5
Benchmark.bm do |x|
 x.report('KMeans') do
 n.times { KMeans.new(data) }
 end
 x.report("Ai4R") do
 n.times do
 b = Ai4r::Clusterers::KMeans.new
 b.build(ai4r_data, 4)
 end
 end
end
    user   system   total    real
KMeans 15.960000  0.030000 15.990000 ( 16.062639)
Ai4R 70.230000  0.180000 70.410000 ( 70.704843)

Copyright © 2009 Red Davis. See LICENSE for details.

About

K Means

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

Languages

AltStyle によって変換されたページ (->オリジナル) /