CLUSTER

What if poeple could be empowered to fight against excessive recommendation algorithms?

KEYWORDS:

K-means

Data mining

Data visualization

personal project

2021

Introduction

I fundamentally believe in the right and freedom of information acquisition. But for a long time，I have been feeling like I gradually lost my dominant right in the big data. To avoid excessive recommendations from big data algorithms to average users and the resulting manipulation effect, I wanted to allow users to be exposed to a wider range of information when searching.

'Cluster' is the design of a special entity mining system. This system is based on the process of mapping data into spatial vectors. With this annotation, I then use the k-means algorithm to constantly cluster these entities.

Based on this algorithmic system, I also designed a web-based interactive interface that enables the user to control the value of k of k-means, as well as allowing the user to zoom in any specific parts and to do other operations.

In this design, users are able to partly control the data mining process. Thus they can obtain results with randomness, and broaden their information scope. To a certain extent, this process serves in fighting against the excessive recommendation algorithms.

I hope that people can gain greater ownership in the era of big data.

github

Inspiration

When I was trying to figure out a special topic for my graduation project, I felt like that the traditional searching system including google and other literature databases could only provide me a list of information with a specific label that I gave. However, a progression of an inspiration search was rarely linear and directed. And an excess recommendation algorithm has prevented me from reaching broader and random information. I conclude that this is an obstacle of gaining more possibilities of inspiration and knowledge acquisition, and also is a restriction of my right to broader world.

About k-means & why k-means

To reach my design solution, I utilize the K-means as a fundamental tool of dealing with data, and the core logic of visualization.

K-means clustering is a method of vector quantization, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

With k-means, I hope to find a method that can visualize the huge diverse dataset and can show the broad knowledge to users with not a merely linear way, instead, with a panoramic view. But still, I need this visualization to have certain regularity - the classification.

When the user is changing k - the numbers of classification, the dataset can be presented in different and random ways. The randomness of result is meaningful in some searching processes as I think.

k

user control

classification numbers

Dataset

I select FB15K237 as the experimental dataset for training and buiding the design.

FB15K dataset contains 592,213 triplets with 14951 entities and 1,345 relationships. FB15K237is a variant of the original dataset where inverse relations are removed.

Operation flow

user intervene

background process

embedding

main codes

Cluster results & visualization rules

cluster number

when k = 2

(examples)

when k = 12

visualization model

（each cluster view）

score of each top

（the Degree of centrality）

10 tops of entities in each cluster

Prototype

alter k = 3

alter k = 5

zoom in a cluster

open database menu & accurate search

zoom in a top

list form performance

slide to view

I tried to build an experience that make user really feels like 'mining' the data. And this experience invovled with gesture interaction design and also an appropriate visualization of different levels of entities.

watch the video here

This project is still under process as I am now experimenting other practical interaction form that will be more suitable and vivid.

Please mark this page to find the new update.

My thanks to Lyric Zhao as he has given me a lot of valuable advice.