Kennismanagement: til je organisatie en medewerkers naar een nieuw niveau
- formation par NCOI Learning
- Anvers & Port d'Anvers, Louvain, Gand
Learn to work with Spark and Databricks, the idealframework for data analytics in the cloud, during this two-day ABIS training!
Nowadays everybody seems to be working with AI, Data Science and "big data". No doubt also you would like to interrogate your voluminous data sources (click streams, social media, relational data, cloud data, sensor data, ...) and are experiencing the shortcomings of traditional data analytics tools. Maybe you want the processing power of a cluster --and its parallel processing capabilities-- to interrogate your distributed data stores.
If fast prototyping and processing speed are a priority, Spark will most likely be the platform of your choice. Apache Spark is an open source processing engine focusing on low latency, ease of use, flexibility and analytics. It's an alternative to the MapReduce approach delivered of Hadoop with Hive (cf our course Big data in practice using Hadoop). Spark has complemented -actually superseded- Hadoop, due to the higher abstraction of Spark's APIs and its faster, in-memory processing.
More specifically, Spark allows to easily interrogate data sources on HDFS, in a NoSQL database (e.g. Cassandra or HBase), in a relational database, in the cloud (e.g. AWS S3) or in local files. Independent of this, a Spark job can easily run on either your local machine (i.e., in development mode), or on a Hadoop cluster (with Yarn), or a Mesos environment, or Kubernetes, or in the cloud. And all this through a simple Spark script or through a more complex (Java or Python) program or though a web based notebook (e.g. Zeppelin or the Databricks cloud platform).
This course builds on the context set forth in the Big data architecture and infrastructure overview course.
Classroom instruction, supported by practical examples and extensive practical exercises.
Delivered as a live, interactive training – available in-person or online, or in a hybrid format. Training can be implemented in English, Dutch, or French.
Familiarity with the concepts of data clusters and distributed processing is necessary; see our course Big data architecture and infrastructure. Additionally, minimal knowledge of SQL and Linux are useful. Minimal experience with at least one programming language (e.g. Java, Python, Scala, Perl, JavaScript, PHP, C++, C#, ...) is a must.
Whoever wants to start practising Spark: developers, data architects, and anyone who needs to work with data science technology.