Topic Modeling on BBC News

A project about topic modeling, which discovering the abstract “topics” that occur in a collection of documents.

Description

This is a projects that used Python to implement clustering algorithm, including GMM and K-means to analyze word frequency in datasets.

Features

  • Analyze .pkl data and output image result which put similar topics together.

  • Used WordCloud to generate output.

  • Unsupervised machine learning.

  • Could be switched between K-mean and K-mean++ for clustering.

  • High-dimensional data.

Examples

Shell

Shell

Github