Open : Scaleable : Big Data : Analytics : Platform
At it is core, PNDA is an open-source, big-data analytics platform designed to scale up and out.
OK - tell me more...
Data aggregators merge different types of data (log data, metric data, network telemetry data) and publish them to a data distribution layer : we look to do minimal data processing (read : pushing the data into a schema) on ingress...not only for simplicity but also so that the source data is available in original form if your analysis needs to change or adapt later. A publish/subscribe (pub/sub) model is used via Apache Kafka to provide a performant and scaleable data distribution layer : data is published as a set of different topics to differentiate between different forms and feeds of data .
Your analytics applications then subscribe to topics to obtain the data they need. We don't make any assumptions on how you'll build your applications although the platform does provide some fundamental capabilities that can be leveraged.
What analytics applications can I write and run on the platform?
- Streaming applications, i.e. which take a direct feed from Kafka instead of taking the data direct from the source
- Low-latency query applications, i.e. which provide the ability to run queries across the OSS dataset in near real-time
- Batch processing applications, i.e. which leverage the master data set in HDFS
- Predictive analysis on time-series data
- Deep-learning applications using high-dimensional data
What is the elevator-pitch?
PNDA brings together a number of functional capabilities that scale horizontally with tools that simplify their use and operation. Some of the main capabilities offered include:
- Processing of high velocity streams of data on ingress
- Processing and storing of high volumes of data after ingress
- Data format and source agnosticism
- Simplified integration via decoupling of data sources from consuming clients
- Efficient mechanisms for handling large quantities of time series data
- Tools for data exploration using the same parallel processing techniques that are used for operationalized functions at scale
The platform is based on many open source technologies, principally centered on Apache Hadoop and Apache Kafka. The main processing workhorses are Apache Spark and Spark Streaming.
PNDA can be launched on any OpenStack datacenter and is also available to try out in the DevNet sandbox. If you’d like to evaluate PNDA, just ask info info@pndaproject.io.
PNDA is being designed with security in mind making use of the latest open source advances in secure big data – please see our "Security Blueprint" on http://pndaproject.io for more details.