#summary Data model at the core of the framework. #labels Phase-Design = Introduction = A mathematical model of social network is presented followed by a EJB 3.0 programming model. = Vertex-Edge Model = == Mathematical model == We try to create a mathematical model of the real human social structure with the following key elements. Every person is a vertex (or node) and every relation between any two persons is represented by a group of two directed edges. Each such edge can have parameters attached to it, which may signify the level of the _friendship_ or _trust score_ (for example). For a relation to exist, it is somewhat obvious that both these edges exist even though they could be dissimilar in their parameter values. However, one-directional relation may be an interesting idea to explore! Thus, a social network graph will be a (bi-)directed (not necessarily) fully-connected cyclic graph. For any two vertices V,,i,, and V,,j,, a relation R can exist when there are two edges E,,ij,, and E,,ji,, in both directions. Each such edge E can have parameters attached to them. For the sake of simplicity in the basic framework, we choose to attach a simple weight w with values between 0 and 1 (but not 0) with 0 indicating no _friendship_ and 1 indicating full _friendship_. Realistically, w = 0 is not possible because that is conceptually similar to breaking that edge. A break in the relation will necessarily need removal of both edges. Thus, bi-directed link in this context is not equivalent to an undirected edge. The presence of the weights w,,ij,, and w,,ji,, makes it absolutely necessary to have a notion of direction. This is visually represented in the following diagram. http://esn.googlecode.com/svn/wiki/images/graph-relation.png It may be interesting to explore situations where w,,ij,, (corresponding to E,,ij,,) is very low compared to w,,ji,, (corresponding to E,,ji,,). Should there be some correlation function between the two? This is similar to a real world scenario where person _i_ hardly considers person _j_ as a friend but person _j_ does consider _i_ as a friend. Should _j_ be not aware of this situation? Is it not obvious through social interactions? If social interactions are not possible during a certain particular period in time, should there be any time decay for the values of w to a neutral value? However, how do we measure _interactions_ in the framework to simulate such a time decay phenomenon? == Programming model == In terms of EJB 3.0 Entity Beans, the above model can be represented by two entity beans: _Vertex_ and _Edge_ with bi-directional entity relation established between the two. Each _Vertex_ has a collection of egress _Edge_ objects and a collection of ingress _Edge_ objects (equivalent to E,,ij,, and E,,ji,, from the perspective of V,,i,, respectively). Each _Edge_, on the other hand, contains a source _Vertex_ and a destination _Vertex_ (equivalent to V,,i,, and V,,j,, respectively). We add another member variable called _trust_ to represent the w parameter for each _Edge_. Code fragments from both these classes would look like: *Vertex* {{{ @Entity public class Vertex implements java.io.Serializable { . . . private Collection egressEdges = new ArrayList(); private Collection ingressEdges = new ArrayList(); . . . @OneToMany(mappedBy="source") public Collection getEgressEdges() { return egressEdges; } . . . @OneToMany(mappedBy="destination") public Collection getIngressEdges() { return ingressEdges; } . . . } }}} *Edge* {{{ @Entity public class Edge implements java.io.Serializable { . . . private Vertex source; private Vertex destination; private double trust = 1.0D; // range = (0 1] with 1 being default . . . @ManyToOne @JoinColumn(name="sourceVertex") public Vertex getSource() { return source; } . . . @ManyToOne @JoinColumn(name="destinationVertex") public Vertex getDestination() { return destination; } . . . public double getTrust() { return trust; } . . . } }}} However, inheritance in EJB 3.0 has pros and cons depending on the design strategy undertaken. For example, a single table strategy is more efficient in terms of performance; but it is not normalised. On the other hand, properly normalised tables present the persistence layer with a complex query, which is a big performance hit.