About
Seif Haridi is one of Sweden's most influential computer scientists, with a career spanning over 40 years in programming languages and distributed systems. His impact ranges from foundational work on logic programming (SICStus Prolog) through multiparadigm programming (Mozart/Oz) to modern big data systems (Apache Flink, Hops).
Fun fact: Seif Haridi was a member of the Egyptian national junior table tennis team.
Current Positions
- Professor Emeritus at KTH Royal Institute of Technology
- Chair-Professor of Computer Systems (parallel and distributed computing)
- Head of Distributed Computing group (DC@KTH)
- Senior Advisor at RISE Research Institutes of Sweden
Education
PhD in Computer Systems
KTH Royal Institute of Technology, Sweden
Engineering Degree
Cairo University, Egypt
Electronics and Communication Systems
Research Areas
- Parallel and Distributed Computing
- Programming Languages and Systems
- Cloud Computing and Big Data
- Stream Processing
- Distributed Algorithms
- Fault Tolerance
Systems
Seif Haridi has co-designed and contributed to numerous influential systems spanning programming languages, distributed computing, and big data processing.
Big Data and Cloud Systems
Apache Flink SIGMOD 2023
Apache Flink is a platform for efficient, distributed, general-purpose data processing. It features powerful programming abstractions in Java and Scala, a high-performance runtime, and automatic program optimization. It has native support for iterations, incremental iterations, and programs consisting of large DAGs of operations.
Flink Streaming is an extension of the core Flink API for high-throughput, low-latency data stream processing. The system can connect to and process data streams from many data sources like RabbitMQ, Flume, Twitter, ZeroMQ and also from any user defined data source.
HOPS / HopsFS IEEE 2017
Hops is a next-generation distribution of Apache Hadoop, with a heavily adapted implementation of HDFS, called HopsFS. HopsFS is a new implementation of the Hadoop Filesystem (HDFS) based on Apache Hadoop 2.8, that supports multiple stateless NameNodes, where the metadata is stored in an in-memory distributed database (NDB).
HopsFS enables NameNode metadata to be both customized and analyzed, because it can be easily accessed via SQL or the native API (NDB API).
Scalaris IEEE 2010
Scalaris is a scalable, transactional, distributed key-value store. It was the first NoSQL database that supported the ACID properties for multi-key transactions. It can be used for building scalable Web 2.0 services.
Scalaris uses a structured overlay with a non-blocking Paxos commit protocol for transaction processing with strong consistency over replicas. Scalaris is implemented in Erlang.
CATS
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing.
CATS is a distributed key-value store that uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware.
Kompics
Kompics is a message-passing component model for building distributed systems by putting together protocols programmed as event-driven components. Systems built with Kompics leverage multi-core machines out of the box and can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems.
Programming Languages and Systems
SICStus Prolog
SICStus is an ISO standard compliant Prolog development system. SICStus is built around a high performance Prolog engine that can use the full virtual memory space for 32 and 64 bit architectures alike. SICStus is efficient and robust for large amounts of data and large applications.
The most well-known logic programming system in the world, developed at the Swedish Institute of Computer Science (SICS).
Mozart Programming System
The Mozart Programming System combines ongoing research in programming language design and implementation, constraint logic programming, distributed computing, and human-computer interfaces. Mozart implements the Oz language and provides both expressive power and advanced functionality.
AKL (Agents Kernel Language)
AKL is a concurrent constraint programming language developed at the Swedish Institute of Computer Science (SICS). In AKL, computation is performed by agents interacting through stores of constraints. This notion accommodates multiple programming paradigms; in appropriate contexts, AKL agents may be thought of as processes, objects, functions, relations, or constraints.
Research Projects
Over 25 years of leading and participating in major European and Swedish research projects on distributed systems, cloud computing, and big data.
Recent Projects (2012-2022)
Continuous Deep Analytics (CDA)
Modern end-to-end data pipelines are highly complex and unoptimized. They combine code from different frontends (e.g., SQL, Beam, Keras), declared in different programming languages (e.g., Python, Scala) and execute across many backend runtimes (e.g., Spark, Flink, Tensorflow). Data and intermediate results take a long and slow path through excessive materialization, conversions down to different partially supported hardware accelerators. End-to-End guarantees are typically complex to reason due to the mismatch of processing semantics across runtimes.
The Continuous Deep Analytics (CDA) project aims to shape the next-generation software for scalable, data-driven applications and pipelines. Our work binds state of the art mechanisms in compiler and database technology together with hardware-accelerated machine learning and distributed stream processing.
ExtremeEarth
ExtremeEarth concentrates on developing techniques and software that will enable the extraction of information and knowledge from big Copernicus data using deep learning techniques and extreme geospatial analytics, and the development of two use cases based on this information and knowledge and other relevant non-EO data sets.
ExtremeEarth will impact developments in the Integrated Ground Segment of Copernicus and the Sentinel Collaborative Ground Segment. ExtremeEarth tools and techniques can be used for extracting information and knowledge from big Copernicus data and making this information and knowledge available as linked data, allowing easy development of applications by developers with minimal or no knowledge of EO techniques, file formats, data access protocols etc.
A Big Data Analytics Framework for a Smart Society (BIDAF)
The overall aim of the BIDAF project is to significantly further the research within massive data analysis, by means of machine learning, in response to the increasing demand of retrieving value from data in all of society. This will be done by creating a strong distributed research environment for big data analytics.
There are challenges on several levels that must be addressed: (i) platforms to store and process the data, (ii) machine learning algorithms to analyze the data, and (iii) high level tools to access the results.
StreamLine
Streamline is funded by the European Union's Horizon 2020 research and innovation program to enhance the European data platform Apache Flink to handle both stream data and batch data in a unified way. The project includes both research and use cases to validate the results.
The project has the following objectives: (i) to research, design, and develop a massively scalable, robust, and efficient processing platform for data at rest and data in motion in a single system, (ii) to develop a high accuracy, massively scalable data stream-oriented machine learning library based on new algorithms and approximate data structures, (iii) to provide a unified interactive programming environment that is user-friendly and easy to deploy in the cloud, (iv) to implement a real-time contextualization engine, enabling analytical and predictive models to take real world context into account, and (v) to develop multi-faceted, effective dissemination of Streamline results to the research, academic, and international community.
E2E-Clouds
E2E-Clouds was a five-year research project financed by the Swedish Foundation for Strategic Research. The goal of the project is to develop an End-to-End information-centric Cloud (E2E-Cloud) for data intensive services and applications.
The E2E-Clouds is a distributed and federated cloud infrastructure that meets the challenge of scale by aggregating, provisioning and managing computational, storage and networking resources from multiple centers and providers. Like some current data-center clouds it manages computation and storage in an integrated fashion for efficiency, but adds wide-scale distribution.
iSocial
The rapid proliferation of Online Social Networking (OSN) sites is expected to reshape the Internet's structure, design, and utility. We believe that OSNs create a potentially transformational change in consumer behavior and will bring a far-reaching impact on traditional industries of content, media, and communications.
The iSocial ITN aspires to bring a transformational change in OSN provision, pushing the state-of-the-art from centralized services towards totally decentralized systems that will pervade our environment and seamlessly integrate with future Internet and media services. OSN decentralization can address privacy considerations and improve service scalability, performance and fault-tolerance in the presence of an expanding base of users and applications.
The main objective of iSocial is to provide world class training for a next generation of researchers, computer scientists, and Web engineers, emphasizing on a strong combination of advanced understanding in both theoretical and experimental approaches, methodologies and tools that are required to develop DOSN platforms.
A Community Networking Cloud in a Box (CLOMMUNITY)
Community networking is an emerging model for the Future Internet across Europe and beyond where communities of citizens can build, operate and own open IP-based networks, a key infrastructure for individual and collective digital participation.
The CLOMMUNITY project aims at addressing the obstacles for communities of citizens in bootstrapping, running and expanding community-owned networks that provide community services organised as community clouds. That requires solving specific research challenges: self-managing and scalable (decentralized) infrastructure services for the management and aggregation of a large number of widespread low-cost unreliable networking, storage and home computing resources; distributed platform services to support and facilitate the design and operation of elastic, resilient and scalable service overlays.
Historical Projects (2000-2010)
Peer-to-Peer Live Streaming (PeerTV)
The PeerTV project developed, deployed and validated peer-to-peer media streaming platforms addressing three key requirements: (i) efficient utilization of upload bandwidth at peers to reduce centrally provisioned bandwidth, (ii) reducing playback latency and increasing playback continuity through novel topologies, and (iii) minimizing network traffic cost for ISPs through AS-aware infrastructure.
SELFMAN
The goal of SELFMAN is to make large-scale distributed applications that are self managing, by combining component models and structured overlay networks. Self management along four axes: self configuration, self healing, self tuning, and self protection.
PEPITO
Peer-To-Peer-Implementation-and-TheOry. Traditional centralised system architectures are ever more inadequate. The PEPITO project investigated completely decentralised models of P2P computing, both how to build them robustly and what can be built.
Information Cities (ICities)
The Information Cities project models aggregation and segregation patterns in a virtual world of infohabitants (humans, virtual firms, on-line communities and software agents). The objective is to capture aggregate patterns of virtual organisation emerging from interaction over information infrastructure.
EVERGROW
The goal of EVERGROW is to build the science-based foundations for the global information networks of the future. The demands on the future Internet will be high, and a number of today's highly manual processes must be automated, such as network management, network provisioning and network repair on all levels.
CoreGRID
The CoreGRID Network of Excellence aims at strengthening and advancing scientific and technological excellence in Grid and Peer-to-Peer technologies. The Network brings together 161 permanent researchers and 164 PhD students from forty-one institutions in six complementary research areas to develop next generation Grid middleware.
Awards and Honors
| Year | Award | Organization | For |
|---|---|---|---|
| 2023 | ACM SIGMOD Systems Award | ACM SIGMOD | Apache Flink - expanding stream data-processing |
| 2019 | European Data Science Technology Innovation | EIT Digital | LogicalClocks and Hopsworks platform |
| 2017 | IEEE Scale Prize | IEEE | HopsFS - most scalable HDFS filesystem |
| 2010 | IEEE Scalability Prize | IEEE | Scalaris - transactional key-value store |
| 1991 | Xerox Chester Carlson Research Prize | Royal Swedish Academy (IVA) | Logic programming and parallel processors |
Book
Concepts, Techniques, and Models of Computer Programming
Authors: Peter Van Roy and Seif Haridi
Publisher: MIT Press, 2004
Pages: 936
ISBN: 978-0262220699
This comprehensive textbook presents computer programming as a unified discipline. It teaches all major programming paradigms in a uniform framework that shows their deep relationships.
Paradigms Covered
- Declarative programming
- Message-passing concurrency
- Object-oriented programming
- Shared-state concurrency
- Constraint programming
- Distributed programming
"In almost 20 years since Abelson and Sussman revolutionized the teaching of computer science with their Structure and Interpretation of Computer Programs, this is the first book I've seen that focuses on big ideas and multiple paradigms, as SICP does, but chooses a very different core model (declarative programming)."
- Brian Harvey, UC Berkeley
The book uses the Mozart Programming System (implementing the Oz language) as its main vehicle for teaching and experimentation.
Companies Founded
LogicalClocks / Hopsworks
Role: Co-founder, Chief Scientist
Founded: 2016
Enterprise data platform for scale-out data science and AI. Won European Data Science Technology Innovation 2019.
HiveStreaming
Role: Co-founder
Enterprise Content Delivery Network (eCDN) for internal live video events using P2P-based video distribution.
Peerialism
Role: Advisor
P2P data transfer system. Named one of Sweden's 33 hottest companies (2009). Acquired for 100M SEK.
PhD Students
Notable Alumni
Ali Ghodsi
PhD 2006: Distributed k-ary Systems
Current: CEO of Databricks (valued at $100B+)
Joe Armstrong
PhD 2003: Reliable Distributed Systems
Legacy: Creator of Erlang (powers WhatsApp, Ericsson)
Paris Carbone
PhD 2018: Data Stream Processing
Current: Apache Flink core contributor
Complete List (25+ PhD Students)
| Name | Year | Thesis Topic |
|---|---|---|
| Paris Carbone | 2018 | Scalable and Reliable Data Stream Processing |
| Fatemeh Rahimian | 2014 | Gossip-Based Algorithms for Information Dissemination |
| Raul Jimenez | 2013 | Distributed Peer Discovery in P2P Systems |
| Roberto Roverso | 2013 | Adaptive HTTP-live Streaming on P2P Overlays |
| Amir H. Payberah | 2013 | Live Streaming in P2P and Hybrid Environments |
| Cosmin Arad | 2013 | Reconfigurable Distributed Systems |
| Tallat Shafaat | 2013 | Partition Tolerance in Overlay Networks |
| John Ardelius | 2013 | On the Performance Analysis of Large Scale, Dynamic, Distributed and Parallel Systems |
| Ali Ghodsi | 2006 | Distributed k-ary System (DHT Algorithms) |
| Erik Klintskog | 2005 | Distribution Support for Programming Systems |
| Sameh El-Ansary | 2005 | Structured Peer-To-Peer Systems |
| Per Brand | 2005 | Distributed Programming Systems (Mozart) |
| Joe Armstrong | 2003 | Reliable Distributed Systems (Erlang) |
| Ashley Saulsbury | 1999 | Latency in Distributed Memory Systems |
| Johan Montelius | 1997 | Fine-grain Parallelism |
| Björn Carlsson | 1995 | Compiling and Executing Finite Domain Constraints |
| Sverker Janson | 1994 | AKL Multiparadigm Programming Language |
| Torbjörn Keisu | 1994 | Tree Constraints |
| Erik Hagersten | 1992 | Cache-Only Memory Architectures |
| Roland Karlsson | 1992 | A High Performance OR-parallel Prolog System |
| Dan Sahlin | 1991 | Partial Evaluator for Prolog |
| Mats Carlsson | 1990 | OR-parallel Prolog Engine |
| Nabiel El Shiewy | 1990 | Robust Coordinated Reactive Computing in SANDRA |
| Bogumil Hausman | 1990 | Pruning and Speculative Work in OR-Parallel Prolog |
Selected Publications
Seminal Works
- "Concepts, Techniques, and Models of Computer Programming" (2004) - MIT Press, 936 pages
- "A history of the Oz multiparadigm language" (2020) - HOPL IV, ACM
- "Apache Flink: Stream and Batch Processing in a Single Engine" (2015) - IEEE Data Engineering Bulletin
- "State Management in Apache Flink" (2017) - VLDB Endowment
- "HopsFS: Scaling Hierarchical File System Metadata" (2017) - FAST
- "Efficient Logic Variables for Distributed Computing" (1999) - ACM TOPLAS
- "Mobile Objects in Distributed Oz" (1997) - ACM TOPLAS
- "DDM - A Cache-Only Memory Architecture" (1992) - IEEE Computer
Academic Profiles
Invited Talks
- HOPL IV 2021: "The History of Oz: A Multiparadigm Programming Language"
- Boston University 2023: Distinguished CS Colloquium
- University of Chicago: "Research in Continuous Deep Analytics"
Teaching
For more than ten years, Seif Haridi has been teaching popular courses on Distributed Algorithms and Peer-to-Peer Computing at KTH.
Online Courses (edX)
YouTube
KTH Courses
- ID2220: Advanced Topics in Distributed Systems
- ID2203: Distributed Systems, Advanced Course
- ID2210: Distributed Computing, Peer-to-Peer and GRIDS
- 2G1126: Distributed Computer Systems (historical)
- 2G1512: Computer Science II (historical)
Contact
haridi@kth.se
seif.haridi@ri.se
+46 8 790 41 22
Electrum 229, Kista, Sweden