High volume, high throughput data streams are common in many industries including financial services (transaction streams), communications (instant messaging, SMS, micro-blogging), web and gaming (action and event streams) and production line environments (machine generated data). The ability to analyse and gain insights from this type of data as events happen, in real-time, can be hugely beneficial.
Traditionally data analytics is performed as an off-line, batch process where the results are available hours or even days after the data was produced. This means that any actions taken based on these insights will be at a considerable time interval after the original events occurred, and in many scenarios being able to analyse the live data stream and hence reduce this response delay is of critical importance.
Clustering is a core data analytics technique whereby similar entities are automatically identified and grouped together. This drives many common applications of data analytics such as detecting anomalous or fraudulent activity, identifyting market segments and user behaviours, reporting spam and emerging topics and patterns. CeADAR has developed a high-throughput, scalable clustering solution for data streams that brings real-time, ‘live data’ capabilities to these advanced data analytics tasks.
Locality Sensitive Hashing
Stream Analysis
Continuous Clustering