Print Email Facebook Twitter Content Propagation in Online Social Networks Title Content Propagation in Online Social Networks Author Blenn, N. Contributor Van Mieghem, P.F.A. (promotor) Doerr, C. (promotor) Faculty Electrical Engineering, Mathematics and Computer Science Department Intelligent Systems Date 2014-06-13 Abstract This thesis presents methods and techniques to analyze content propagation within online social networks (OSNs) using a graph theoretical approach. Important factors and different techniques to analyze and describe content propagation, starting from the smallest entity in a network, representing a user-account, up to complete friendship graphs and traces of content are described. All individuals and their attributes are stating the basic elements for statistical analysis of user behavior and individuals interests. When trying to identify the opinion of the population of a country for example, a random sample or data from everyone within the population is needed, a task which is not trivial because of different activity patterns and the fact that individuals may either do not provide information about themselves or obscure their data by supplying bogus information. This thesis shows that obtaining a random sample of the population of the Netherlands is possible in terms of certain parameters like the location, family and first names of users. Such a sample is likely not to be “random” in terms of the age of inhabitants and the usage of gathered data in order to predict the outcome of elections may be questioned. The representation of an individual's view onto an OSN is called an ego-centric network. It contains all friends and relations between friends of an ego within a sub-graph. Within such graphs, the influence between friends can be estimated improving the usability of recommendation systems which also raises concerns about the privacy of users. This thesis describes possibilities to reconstruct private information of a user if only a few friends of the individual share their data publicly because most friendships are created between persons having similar interests. Therefore the current way of dealing with privacy concerns, by enabling users to protect their data, is not sufficient. The structure of ego-centric networks also unveils the ability of egos to spread and control the spread of information as a person completely embedded in a group has less control over disseminating content than a person connecting multiple groups. A snapshot of a whole network of an OSN includes all user-accounts (nodes) and friendships (links) at a certain point in time. But as OSNs may contain millions of nodes the process of obtaining data by crawling is likely to be skewed depending on the used method and duration. Therefore a new way of traversing the graph called “Mutual Friend Crawling” is proposed in which certain network metrics converge faster to the final value by also detecting communities of users while traversing the graph. When analyzing the diffusion process of content in multiple OSNs, only a limited fraction of the neighbors of a user (i.e. friends) are ”useful” in terms of spreading content to their peers. Commonly used network metrics which reflect the centrality of a node are shown to have no correlation with the ability to repeatedly succeed in passing messages to a high number of users. The reason lies in the fact that the whole network of friends contains inactive or abandoned user accounts and a critical dependency to the time a message was sent exist. This denotes that friends of a user that forward a message have to be available or online at the time they are “needed” in order to forward content. On the other hand, influential groups might exist which act together in order to spread content with the help of each other. These groups might organize themselves via external communication channels, shown by the example of a famous group, the “Digg Patriots”, where members of the group cannot be found through purely topological measures. A similar time dependency exists in terms of the evolution of OSNs, because users can only forward information or befriend others when they are online. The interactivity durations of these actions are shown to be log-normal like distributed rather than exponential or power-law as assumed in multiple previous publications. The argumentation for such an assumption is based on the fact that power-law and exponential distributions would indicate most interactivity durations to be very short whereas individuals always need some time to complete tasks. However, it is shown that the time-scale of observations is crucial, because log-normal and power-law distributions with a small exponent might look the same in a log-log plot if the chosen bin-size is too large. Another process involved in the structural evolution of a friendship network is given by markets that sell friendship relations in OSNs. These markets are accounting for quite a high number of friendship relations whereas their usage has usually a negative connotation. But in terms of content propagation they might be beneficial because, for example politicians, “buying” followers are able to reach users which would otherwise not connect to them. The term viral spreading is often used in combination with content propagation within the network of an OSN. Therefore certain parameters of epidemiology are compared to ”viral spreading” in Twitter. It was found that most messages had a low basic reproductive ratio <1, a ratio depicting the infectious a virus, whereas few messages were highly infectious because a high number of users forwarded them. Interestingly even these popular messages were not able to spread to a large fraction of the total number of Twitter users. When trying to use epidemiological theory the “Susceptible-Exposed-Infected-Removed” model seems to be applicable to content propagation exhibiting the complication that the distribution of the duration a user is “exposed” and “infected” seems to be log-normal distributed. The distribution of these durations, also called observation and reaction duration denotes that Markov theory cannot be applied to model the “epidemics”. Another more general approach is therefore given by a Bellman-Harris branching process. The content, propagating though a network can be analyzed using graph theory as well in order to get insights into population statistics. The example of mobility pattern was chosen to depict the “meaning” of community detection within graphs created out of locations of Twitter users. The detected patterns allow better planning of transportation services, depicting in which areas of the Netherlands people are most frequently traveling during the working weekdays and weekends. Analyzing the most common type of content, short colloquial text, using a new unsupervised way of estimating the sentiment of messages enables the analysis of graphs in which words are denoted as nodes and links describe the co-occurrence of words. These graphs reflect which words are related to concepts and their sentiment allowing to infer the perception of products and concepts within the population of OSN users. Subject Online Social Network Analysis To reference this document use: https://doi.org/10.4233/uuid:bf5e9273-c503-448d-8358-62cb2d168f28 Embargo date 2014-06-05 ISBN 9789461863249 Part of collection Institutional Repository Document type doctoral thesis Rights (c) 2014 Blenn, N. Files PDF dissertation-NorbertBlenn.pdf 23.72 MB Close viewer /islandora/object/uuid:bf5e9273-c503-448d-8358-62cb2d168f28/datastream/OBJ/view