Social Network Analysis

Performed K-core decomposition on social network graphs. K-core value of a network is a subgraph such that all the vertices in the subgraph have atleast K neighbours. The core values for each node were computed in two ways top-down and bottom-up. Bottom-up decomposition is easy to implement but is slow. While top-down decomposition is optimized by calculating the upper bound estimation.

Dataset

The dataset used for this project are the available at Stanford Large Dataset Collection. Specifically the datasets used were

Amazon (334,863 vertices, 925,872 edges)

DBLP (317,080 vertices, 1,049,866 edges)

YouTube (1,134,890 vertices, 2,987,624 edges)

LiveJournal (3,997,962 vertices, 34,681,189 edges)

Applications

The core values computed were used to identify anomalies in social networks. If there is a huge difference between the core value and degree of a node then it can be considered as an anomaly.

The other application is to detect top spreaders. Given a social network graph we can identify the users which can maximize the information spread. These users can then be targeted for running marketing campaigns, recommendations, etc. These are identified by calculating the degenercy core of each node and are evaluated using the SIR model. In the SIR model each vertex can be in one of the three states Susceptible, Infected and Recovered. Initially all nodes are susceptible and one node is infected. In the next step the infected node recovers and all the neighbours of the previously infected node are infected. This continues until no nodes are infected (like BFS). Once a node recovers it cannot be infected again.