In the trend of social media, graph is frequently mentioned as a common description of social network/friend circle/work connection in real-life. But graph can be really large and hard to store in one server. Therefore, distributed clusters/systems are here to help.
Typical graph processing
Each node is assigned with a value, in each iteration, it gather values from its immediate neighbors and does some computation using its own value and neighbor value to generate new value. After that, update its new value and send it out to neighbors.
Distributed graph processing
Assign each vertex to one server. Then, each server thus gets a subset of vertices since a server might get several assigned vertices.
Then, they do typical Gather-Apply-Scatter for all assigned vertices.
For assigning process, it can based on different options, such as hash-based or locality-based (assign related neighbor to the same server).
Pregel System by Google
Master Node: | Worker Node: |
---|---|
1.Maintain list of worker nodes | 1.Processes its vertices |
2.Detect failure | 2.Communicate with other workers |
In execution, it can be described as:
In case of failure, each server save the state of their vertices periodically to generate “checkpoint” in permanent storage, so that they can retrieve after recovery.
Small World Network
High cluster coefficient (CC), short path (path length of shortest path).
Distribution of degree
the probability of a given node has k edges can follow different distribution, such as:
Type of Distribution | Formula |
---|---|
Regular graph | 1, since all nodes same degree |
Gaussian graph | Same as normal distribution |
Exponential graph | $e^{-k*c}$, k: # of edges, c : constant |
Power Law graph | $k^{-\alpha}$, k: # of edges, $\alpha$: constant |
A lot of small world networks are power law graphs, but not all.
- 本文作者: Yu Wan
- 本文链接: https://cyanh1ll.github.io/2021/01/15/graph-processing/
- 版权声明: CYANH1LL