In order to curate content each piece needs to be valuated. But how to measure the value of content?
In the traditional curation model editors decide for their readers which content is valuable enough to make it into a publication, the editors’ conference decides about location and size of content. As a consequence a content’s value is based on the collective perception of a group of interdependent individuals influenced by experience, power, money, influence, relationships etc..
This model is broken:
The people formerly known as the audience are those who were on the receiving end of a media system that ran one way, in a broadcasting pattern, with high entry fees and a few firms competing to speak very loudly while the rest of the population listened in isolation from one another— and who today are not in a situation like that at all.
(Jay Rosen: The people formerly known as the audience)
So if it’s not the editors valuating content for the audience anymore it has to be the audience itself: the individual reader.
A not so new way to let the reader describe what constitutes valuable content to him is by letting him define categories he is interested in. Publications commonly use theme-oriented RSS feeds (see NYT here) to narrow the content stream. Tools like Google Alert use keywords, social media like Twitter uses crowdsourcing. While they aggregate and filter the increasing content stream they cannot curate based on an individual reader’s relevance criteria.
Curation requires associating either an absolute value (a score) or at least a relative value (a rank) to a piece of content. A score can be based on a combination of several dimensions:
For breaking news the older a content the less relevant it is while for scientific or legal content the publication date of the content might not influence relevance at all.
Hyperlocal news are more relevant the closer they are to a reader’s location or a location he has expressed interest in.
For technical content a fit into a category might determine relevance. Naval transportation is relevant to those interested in off-shore wind energy but certainly much less to those looking into on-shore wind energy.
The number of times an entity (person, organization, company, thing) is mentioned in an article can influence the score.
How closely a content sharing person is related to a reader in social media plays a role (go here for a good description of Facebook’s EdgeRank) as well as how many times a piece of content has been liked/read/watched/tweeted/retweeted.
The correlation between two contents or their metadata can be used for scoring. For the first one the semantic profile similarity of two contents needs to be measured resulting in a content similarity score. For the latter one matching of metadata elements (author, source, topic, tags etc.) is measured.
A reader’s profile can be defined in specific terms (age, gender, profession…). Matching of terms can be used as a base for ranking. More complex is using the reader’s behavior to determine his relevance criteria and match content against these.
The more dimensions are taken into consideration the more complex the scoring mechanism becomes. For instance if the impact of location proximity is increased a dilution on lets say the social graph impact will be the consequence. In other words: local content is moved up the list, social media content is moved down the list. The more parameters determine the score the better the reader’s relevance criteria can be met, but the more complex the behavior of the curation algorithm becomes.
There are two ways to manage this dilemma:
- Based on an in-depth understanding of his target group, their expectations, decision criteria and interaction a curator adjusts the scoring parameters to deliver the optimal result for his readers. This puts him into a position similar to the guys in charge of Google’s PageRank or Facebook’s EdgeRank. With all the consequences.
- The individual reader himself is offered the chance to adjust the scoring parameters in order to optimize his personal relevance scoring. So he becomes the master of his own filter bubble.
For sizable target groups with homogeneous interests (sport, entertainment, lifestyle…) the first approach is suitable while for knowledge workers with a very narrow and very deep relevance definition (physicians, researchers, analysts, scientists…) the second approach might be preferable.