Apache Spark

Open source unified analytics engine for large scale data processing. Provides interface for programming clusters with implicit data parallelism and fault tolerance.

Has its architectural foundation in the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.

References:


No notes link to this note