当前位置:常识百科馆>游戏数码>电脑>

python实现K-means算法

电脑 阅读(2.27W)

k-means 算法接受参数 k ;然后将事先输入的n个数据对象划分为 k个聚类以便使得所获得的聚类满足:同一聚类中的对象相似度较高;而不同聚类中的对象相似度较小。聚类相似度是利用各聚类中对象的均值所获得一个“中心对象”(引力中心)来进行计算的。通过随机选取几个聚类中心,并计算所有点到中心的距离,选取最近的一类,在以这个簇为中心,求簇中点的均值形成新的类。

python实现K-means算法

操作方法

(01)第一步计算欧氏距离并取样,k代表分类的总个数import numpy as np#calculate the O distancedef calculate_distance(vector1,vector2):import numpy as npreturn ((re(vector1-vector2)))#initialize centroidsdef initialize_centroids(data,k):import randomreturn le(data,k)

python实现K-means算法 第2张
python实现K-means算法 第3张

(02)产生新的簇类并求出最短距离#find the minimun diastance from individual to centroidsdef minimun_distance(data,centroidlist):clusterdictionary=cd=dict()for i in data:vector1=imarker=0min_dist=float(inf)for j in range(len(centroidlist)):vector2=centroidlist[j]distance=calculate_distance(vector1,vector2)if distance<min_dist:min_dist=distancemarker=jif marker not in ():clusterdictionary[marker]=list()clusterdictionary[marker]nd(i)return clusterdictionary#get centroidsdef getcentroids(clusterdictionary):import numpy as npcentroidlist=list()for key in ():centroid=(y(clusterdictionary[key]),axis=0)nd(centroid)return y(centroidlist)

python实现K-means算法 第4张

(03)导入数据并计算,当簇中心变化小于一定阈值跳出循环#get mean squared deviationdef getmsd(clusterdictionary,centroidlist):sum=0.0for key in ():vector1=centroidlist[key]distance=0.0for i in clusterdictionary[key]:vector2=idistance+=calculate_distance(vector1,vector2)sum+=distancereturn sum#show resultdef showresult(clusterdictionary,centroidlist):import ot as pltcolormark=[&#x27;or','ob','og','ok']centroidmark=['dr','db','dg','dk']for key in ():(centroidlist[key][0],centroidlist[key][1],centroidmark[key],markersize=12)for i in clusterdictionary[key]:(i[0],i[1],colormark[key])path='C:UsersjyjhDesktop'data=open(path,'r')lines()temp=list()import refor i in data:numlist=list()for j in p()t('t'):num=float(j)nd(num)nd(numlist)data=y(temp)centroidlist=initialize_centroids(data,4)clusterdictionary=minimun_distance(data,centroidlist)new_msd=getmsd(clusterdictionary,centroidlist)old_msd=-0.000001k=2while(abs(new_msd-old_msd)>=0.00001):centroidlist=getcentroids(clusterdictionary)clusterdictionary=minimun_distance(data,centroidlist)old_msd=new_msdnew_msd=getmsd(clusterdictionary,centroidlist)k+=1print new_msd-old_msdshowresult(clusterdictionary,centroidlist)

python实现K-means算法 第5张

特别提示

对Kmeans了解

matlab有kmeans函数