Detect Inflated Follower Numbers in OSN Using Star Sampling

Hao Wang and Jianguo Lu

Weibo, a public platform for people to make friends, for communities to spread messages and for companies to promote themselves. In less than three years, Weibo attracts 250 million users and it is still growing.

 The properties of online social networks are of great interests to general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden structure of the underlying data. We introduce an efficient sampling method, called star sampling, for online social networks, and reported its results on two online networks twitter and weibo. The properties revealed include the macroscopic view of the OSN such as the total accounts, the degree distributions, zombies. In ad- dition the method can also discover the top bloggers, their followers and their topological structure. The contribution of the paper lies in both its method and the content revealed by the method. More specifically, we provide 1) a novel and efficient sampling method that is derived from the unique access interface provided by OSN; 2) the properties of the fast growing OSN weibo that has not been reported yet; 3) detection of inflated follower numbers of top users.

Data

Due to the private policy of Weibo, this data is only for scientific use. Some data include secure information may not be available for download, details upon request--(wang115o@uwindsor.ca)
The raw data is crawled using API provided by Weibo,
it is in jason format, I construct it in such a format:
ID \n
Information(Profile) \n
Friends List \n
\n
You can parse it using matlab or the program provided in my home page or briefly download them in the following hyperlinks.
Notes: The programs we post require dot net 4.0 and the data might be in chaos because some of them are chinese characters, you
need to change the system to UTF-8 displaying mode.

Social graph

Mapping table from numeric ID to screen name

Graph Structure Sample with 1.18 million nodes

Indegree, Outdegree and Message Information

Top 100K celebrities mapping information using STAR SAMPLING

Locations&Create Time

Gender Distribution

Follower Estimation