Schematic overview of DycomDetector

DycomDetector is a novel approach for topic modeling in text corpora. Our algorithm extracts and classifies the keywords, calculates relationships based on keyword co-ocurrences, constructs the networks at different time points, applies a graph partitioning algorithm to extract latent communities. The intuitive interface of our system supports various interactive features, such as lensing and filtering by the sudden increase in term frequency, vertex degree, betweenness centrality, etc. It also allows the users to search for a topic of interest and visualize its temporal relations with other detected communities. DycomDetector schema To enable users to explore the vast temporal text corpora in an efficient way, DycomDetector adopts following steps (as depicted in the above figure):
  1. Extract and classify terms: The text documents are preprocessed into entities, ranked by frequencies, and further classified into different categories: people, places, organizations, and miscellaneous.
  2. Construct networks of collocated terms: This step constructs the relations of terms/phrases based on their co-occurences in the same political blogs. At each time point, we obtain a network snapshot of important terms.
  3. Refine the network snapshots: In the above example, we filter vertices with frequency more than 6. Louvain method is applied on the refined networks automatically to detect political events (detected communities highlighted in the gray background in the last panel).
  4. Generate visualization and interactions: Each sub-network (at each time point) is represented as a thumbnail which summarizes its network structure. Consecutive thumbnails can be expanded on mouse over. Modularity histograms, text clouds, arcs diagrams, and small multiples provide supplementary views of these sub-networks.
Here are some important political events (term communities) detected in the above example: