Aggregation
To aggregate content we need to establish access to its source. There are various parameters impacting content access:
Accessibility
- Public Content: Access does not require any form of authorization. However, a website might state limitations to robots and crawlers for acquiring content.
- Private Content: Access requires authorization via login, VPN, physical address, encryption or similar.
Usage
- Public Domain: Content can be used without restrictions.
- Creative Commons: Content can be used with limitations.
- Copyright: Content can only be used within fair use boundaries.
- Contract: Content can be used as specified by an individual contract with copyright owner.
Publication Frequency
- High-frequency: Content changes multiple times within a period, e.g. tech blogs
- Low-frequency: Content changes rarely, e.g. Wikipedia
- Static
Data Volume
- Low-volume: Primarily text
- Mid-volume: Multimedia content with text, pictures, graphics
- High-volume: Audio, video
Communication Performance
- Low-performance: Source only delivers with a limited bandwidth, e.g. blog on home server
- High-performance: Source delivers with a high bandwidth, e.g. Youtoube
Relevancer’s Aggregation Engine is geared for performing content aggregation 24x7 fully automated. It has been successfully used in projects with up to 5,000 sources, 26,000 feeds and a data volume of 1TB per month.
