Data portability and data access

There's a growing trend for the web to become a (the) platform to be consumed by webapplications themselves. Users can use one login for different sites, sync information between different web applications, import/export information to/from a site. All this is related to identity management and data portability, and on a second level to digital communities/social networks.

Skype proposes that community building applications must have a defined set of areas/scenarios upon which they can interact to facilitate the desired interoperability between them. Here they are (skype journal):

Social Stack's Six Zones of Interoperability

* ID (Account lifecycles, Login)
* Sync (Profile, Contacts, Objects)
* Permission (Policy, Licensing)
* Find (People Search, Discovery, Gatekeepers)
* Action (Group Actions, Relationship Actions)
* Now (Alerting, Presence)

The idea is that there can be one single sign in (openid), there must be a standard way to sync information between applications - eg. if i export my contacts/friends from facebook to hi5 and then remove that friend in facebook does it get deleted in hi5?, how about the other way round? -, to find people between apps - If i have a friend in facebook and i also have a myspace account, could myspace alert me that my friend is in the network as well? should myspace do it? Maybe the friend wouldn't like it to because he stores his work colleagues in facebook and his closer friends in myspace. These are questions that must be solved for dataportability to become a reality. Also check Robert Scoble's post on this topic.

Dataportability also poses the question of the unecessary duplication of content around the web. If a have a blog in wordpress is it really necessary for myspace to store and sync with wordpress my blog posts? Isn't that just making things difficult, ie. 2 servers now have the same data and must sync this between them, how about if a user changes a post in the wordpress blog and another user changes the same post in myspace, when syncing which version wins? Ouch.. version control management. These are real roadblocks to dataportability.

In some cases it may be simpler to just allow for data access. In the example above, why not just let myspace access the wordpress blog through an rss feed and every change made in wordpress immediately gets reflected in myspace? So instead of dataportability - taking my data from one place (exporting) to another place (importing) -, why not just simple data access?

Or maybe we need both.

Maybe dataportability and data access have different use cases? You want your data to be portable between competing/equal services and you also want to share it with different services. Eg. i want to export my photos from flickr and import them to imageshack and i also want to show my photos in my wordpress blog whether they're stored in flickr or imageshack.

Ultimately it will be up to the users to decide if they want to take their data with them or just share it. Developers just need to work out a way to make this possible.

are you here?

everybody knows; found somewhere in ffffound.com

maybe not now. but what about in ten years?

maybe you've posted your photo in any one of the services you consume in the web, maybe a blog, a twitter, a flickr, a you tube, anything that supports users uploading photos. but this is just your photo.

what about the separate things that make up your identity? your interests, working history, your creative side, your thoughts, your real and/or digital friends, your role models, anything and everything you can think of, including the small and seemingly uninteresting things you've done in the web. all this data could already be out there, even if under different aliases/usernames.

but think about it in today's web:

  • your friends: hi5.com, facebook.com, myspace.com (there are standards emerging for machine-readable descriptions of relations between people: xfn for example)
  • your interests: same as above, lastfm
  • employment history: linkedin
  • your creative side: maybe a blog, a blog in myspace, a vlog, a photoblog, flickr, youtube
  • your contact info: linkedin, plaxo, every other app out there
  • your click and search history: google's web history, hooeey
  • a lifestream is a collection in one centralized place of everything you do on the web, a centralized me of sorts on the web; think you can have the photos you post on flickr, the songs you listen to in lastfm, the videos you post on youtube, the posts on your blog, your messages on twitter, the items you mark in google reader or digg, in your lifestream. Several sites enable you to build your own lifestream like friend feed, lifestream.fm (see an example here )
  • A lifestream can be used to generate an attention profile. APML is becoming a standard for saving a user's interests to display more relevant information in the future to that user. It can use browser history, im messages, email, documents, favorite artists (favorite anything really) and potentially everything you do on the web. From the source: "APML is a proposed standard that gives you greater control over your own attention data, and in principle will allow you to selectively record your attention profile - the sites you visit, the search terms that interest you most, the content you most commonly link to - and share it with your favorite websites and services". Several sites already allow the import and export of your apml profile (eg. digg, clutzr) and even more use the concept (eg. lastfm and amazon are live examples) to display information relevant to you.
    In a few years most sites will do this due to the information overload we are living in. This information about you can then be used to help and/or manipulate you more efficiently

Now, this information may be put in machine-readable formats like microformats or not. Be it as it may, the exponential growth in computer processing power may allow for a collection and analysis of your breadcrumbs all over the web in one single unified place. You'll have a pretty complete picture of who you are on the web if anyone bothers to investigate.

It can be a marketing glee, yes. At the same time it means that the information you receive will be more relevant to you. Either way, the potential is enormous. The more machines know about us, the more useful they can be.

These unified/disparate breadcrumbs around the web form a digital identity. Something you may or may not think much about but it is You, much like what you do, say, write, in everyday life forms the core of your social identity.

No other medium in history has put the concept of social identity in such a centerplace.

So what is social identity?

Social identity according to Henri Tajfel and John Turner (1979) is composed of four elements (source:):
  • Categorization: We often put others (and ourselves) into categories. Labeling someone a Muslim, a Turk, or a soccer player are ways of saying other things about these people.
  • Identification: We also associate with certain groups (our ingroups), which serves to bolster our self-esteem.
  • Comparison: We compare our groups with other groups, seeing a favorable bias toward the group to which we belong.
  • Psychological Distinctiveness: We desire our identity to be both distinct from and positively compared with other groups.
source

In the Social Identity Theory, a person has not one, “personal self”, but rather several selves that correspond to widening circles of group membership. Different social contexts may trigger an individual to think, feel and act on basis of his personal, family or national “level of self” (Turner et al, 1987). Apart from the “level of self”, an individual has multiple “social identities”. Social identity is the individual’s self-concept derived from perceived membership of social groups (Hogg & Vaughan, 2002). In other words, it is an individual-based perception of what defines the “us” associated with any internalized group membership. This can be distinguished from the notion of personal identity which refers to self-knowledge that derives from the individual’s unique attributes.
(source)

What about identity itself?

[...] rests upon a distinction among the psychological sense of continuity, known as the ego identity (sometimes identified simply as "the self"); the personal idiosyncrasies that separate one person from the next, known as the personal identity; and the collection of social roles that a person might play, known as either the social identity or the cultural identity.
[...] This paradigm focuses upon the twin concepts of exploration and commitment. The central idea is that any individual's sense of identity is determined in large part by the explorations and commitments that he or she makes regarding certain personal and social traits. It follows that the core of the research in this paradigm investigates the degrees to which a person has made certain explorations, and the degree to which he or she displays a commitment to those explorations.
(source)

I'll repeat :

  • "Social identity is the individual’s self-concept derived from perceived membership of social groups"
  • "any individual's sense of identity is determined in large part by the explorations and commitments that he or she makes regarding certain personal and social traits"

The web allows access to a wider range of people and so of possible Categorizations (see above) . This means that i can form an attraction and/or repulsion to a wider range of social groups, enriching the sense of who i am. It also allows us to play, to augment certain parts of our identity in one digital identity and other parts in another digital identity. This can be done with lesser risks than what was previously possible in history. It algo gives us the means to learn the knowledge required by these new roles, ie. i can learn to code, i can investigate philosopy, psychology, etc, every and any other concept is available for me and will be with ever increasing information in the next years.

These explorations in the web will of course leave a trace: your digital identity made up of your web history potentially accessible at any moment in time to anyone. Potentially because from the moment you sign up to an application you implicitly trust it won't misuse the information you provide during the use of that application. It is an act of trust because in the end you really have little control over it (do things really get deleted or just hidden from the user for later analysis? etc).

This is important because the presence of a digital identity(ies) will become an everyday experience for everyone. It will afect the way you see yourself (the experiences you have online can have impact), your hiring potential (this happens already in some areas of expertise), the people you meet, the people you identify with and the people you don't identify with. It will affect our lives in ways we cannot now imagine how. We might have universal access to everyone's identity (past and present) in the future. By this i mean that if i want to find out who x is, i google him, search him up in hi5, facebook, etc. Each breadcrumb i find will help me form an image of who that person is now, even if he wrote x 5 years ago. And it stands to reason that the information available about x will be richer in a few years than it is today. The impact this will have on everyone of us individually and on and the social relations we form is unknown.

To resume

  • The web industry is moving more and more to the sharing and relating of previous disperse information. This is a good thing but it poses questions (whether or not this is being done with an attention to privacy - which it is)
  • We have the ability to try out more identies than was previously possible
  • We have access to information that supports the construction of those identities
  • We have acccess to people that support the construction of those identities and so the system is of infinite richness
  • We have to deal with the impact those identities have on us and on the way others perceive us
  • We have to deal with the lasting impact those identities have on us and on the way others perceive us, ie. what you do on the web is potentially forever there meaning you may not be able to delete it or prevent others from seeing it

Food for thought :)