Seif Haridi

Seif Haridi

Chair-Professor of Computer Systems

KTH Royal Institute of Technology, Stockholm, Sweden

Senior Advisor, RISE Research Institutes of Sweden

About

Seif Haridi is one of Sweden's most influential computer scientists, with a career spanning over 40 years in programming languages and distributed systems. His impact ranges from foundational work on logic programming (SICStus Prolog) through multiparadigm programming (Mozart/Oz) to modern big data systems (Apache Flink, Hops).

Fun fact: Seif Haridi was a member of the Egyptian national junior table tennis team.

Current Positions

  • Professor Emeritus at KTH Royal Institute of Technology
  • Chair-Professor of Computer Systems (parallel and distributed computing)
  • Head of Distributed Computing group (DC@KTH)
  • Senior Advisor at RISE Research Institutes of Sweden

Education

1981

PhD in Computer Systems
KTH Royal Institute of Technology, Sweden

1974

Engineering Degree
Cairo University, Egypt
Electronics and Communication Systems

Research Areas

  • Parallel and Distributed Computing
  • Programming Languages and Systems
  • Cloud Computing and Big Data
  • Stream Processing
  • Distributed Algorithms
  • Fault Tolerance
250+
Publications
12,800+
Citations
42
h-index
25+
PhD Students
40+
Years of Research

Systems

Seif Haridi has co-designed and contributed to numerous influential systems spanning programming languages, distributed computing, and big data processing.

Big Data and Cloud Systems

Apache Flink SIGMOD 2023

Apache Flink is a platform for efficient, distributed, general-purpose data processing. It features powerful programming abstractions in Java and Scala, a high-performance runtime, and automatic program optimization. It has native support for iterations, incremental iterations, and programs consisting of large DAGs of operations.

Flink Streaming is an extension of the core Flink API for high-throughput, low-latency data stream processing. The system can connect to and process data streams from many data sources like RabbitMQ, Flume, Twitter, ZeroMQ and also from any user defined data source.

Recognition: 2023 ACM SIGMOD Systems Award

HOPS / HopsFS IEEE 2017

Hops is a next-generation distribution of Apache Hadoop, with a heavily adapted implementation of HDFS, called HopsFS. HopsFS is a new implementation of the Hadoop Filesystem (HDFS) based on Apache Hadoop 2.8, that supports multiple stateless NameNodes, where the metadata is stored in an in-memory distributed database (NDB).

HopsFS enables NameNode metadata to be both customized and analyzed, because it can be easily accessed via SQL or the native API (NDB API).

Recognition: IEEE Scale Prize 2017 - most scalable HDFS filesystem

Scalaris IEEE 2010

Scalaris is a scalable, transactional, distributed key-value store. It was the first NoSQL database that supported the ACID properties for multi-key transactions. It can be used for building scalable Web 2.0 services.

Scalaris uses a structured overlay with a non-blocking Paxos commit protocol for transaction processing with strong consistency over replicas. Scalaris is implemented in Erlang.

Recognition: IEEE Scalability Prize 2010

CATS

Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing.

CATS is a distributed key-value store that uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware.

Kompics

Kompics is a message-passing component model for building distributed systems by putting together protocols programmed as event-driven components. Systems built with Kompics leverage multi-core machines out of the box and can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems.

Programming Languages and Systems

SICStus Prolog

SICStus is an ISO standard compliant Prolog development system. SICStus is built around a high performance Prolog engine that can use the full virtual memory space for 32 and 64 bit architectures alike. SICStus is efficient and robust for large amounts of data and large applications.

The most well-known logic programming system in the world, developed at the Swedish Institute of Computer Science (SICS).

sicstus.sics.se

Mozart Programming System

The Mozart Programming System combines ongoing research in programming language design and implementation, constraint logic programming, distributed computing, and human-computer interfaces. Mozart implements the Oz language and provides both expressive power and advanced functionality.

Started 1991 as collaboration between DFKI (Germany) and SICS (Sweden)

AKL (Agents Kernel Language)

AKL is a concurrent constraint programming language developed at the Swedish Institute of Computer Science (SICS). In AKL, computation is performed by agents interacting through stores of constraints. This notion accommodates multiple programming paradigms; in appropriate contexts, AKL agents may be thought of as processes, objects, functions, relations, or constraints.

Research Projects

Over 25 years of leading and participating in major European and Swedish research projects on distributed systems, cloud computing, and big data.

Recent Projects (2012-2022)

Continuous Deep Analytics (CDA)

SSF, 2018-2022

Modern end-to-end data pipelines are highly complex and unoptimized. They combine code from different frontends (e.g., SQL, Beam, Keras), declared in different programming languages (e.g., Python, Scala) and execute across many backend runtimes (e.g., Spark, Flink, Tensorflow). Data and intermediate results take a long and slow path through excessive materialization, conversions down to different partially supported hardware accelerators. End-to-End guarantees are typically complex to reason due to the mismatch of processing semantics across runtimes.

The Continuous Deep Analytics (CDA) project aims to shape the next-generation software for scalable, data-driven applications and pipelines. Our work binds state of the art mechanisms in compiler and database technology together with hardware-accelerated machine learning and distributed stream processing.

ExtremeEarth

H2020, 2019-2021

ExtremeEarth concentrates on developing techniques and software that will enable the extraction of information and knowledge from big Copernicus data using deep learning techniques and extreme geospatial analytics, and the development of two use cases based on this information and knowledge and other relevant non-EO data sets.

ExtremeEarth will impact developments in the Integrated Ground Segment of Copernicus and the Sentinel Collaborative Ground Segment. ExtremeEarth tools and techniques can be used for extracting information and knowledge from big Copernicus data and making this information and knowledge available as linked data, allowing easy development of applications by developers with minimal or no knowledge of EO techniques, file formats, data access protocols etc.

A Big Data Analytics Framework for a Smart Society (BIDAF)

KK-stiftelsen (KKS), 2014-2019

The overall aim of the BIDAF project is to significantly further the research within massive data analysis, by means of machine learning, in response to the increasing demand of retrieving value from data in all of society. This will be done by creating a strong distributed research environment for big data analytics.

There are challenges on several levels that must be addressed: (i) platforms to store and process the data, (ii) machine learning algorithms to analyze the data, and (iii) high level tools to access the results.

StreamLine

H2020, 2016-2018

Streamline is funded by the European Union's Horizon 2020 research and innovation program to enhance the European data platform Apache Flink to handle both stream data and batch data in a unified way. The project includes both research and use cases to validate the results.

The project has the following objectives: (i) to research, design, and develop a massively scalable, robust, and efficient processing platform for data at rest and data in motion in a single system, (ii) to develop a high accuracy, massively scalable data stream-oriented machine learning library based on new algorithms and approximate data structures, (iii) to provide a unified interactive programming environment that is user-friendly and easy to deploy in the cloud, (iv) to implement a real-time contextualization engine, enabling analytical and predictive models to take real world context into account, and (v) to develop multi-faceted, effective dissemination of Streamline results to the research, academic, and international community.

E2E-Clouds

SSF (Swedish Foundation for Strategic Research), 2012-2017

E2E-Clouds was a five-year research project financed by the Swedish Foundation for Strategic Research. The goal of the project is to develop an End-to-End information-centric Cloud (E2E-Cloud) for data intensive services and applications.

The E2E-Clouds is a distributed and federated cloud infrastructure that meets the challenge of scale by aggregating, provisioning and managing computational, storage and networking resources from multiple centers and providers. Like some current data-center clouds it manages computation and storage in an integrated fashion for efficiency, but adds wide-scale distribution.

iSocial

FP7 Marie Curie ITN, 2013-2017

The rapid proliferation of Online Social Networking (OSN) sites is expected to reshape the Internet's structure, design, and utility. We believe that OSNs create a potentially transformational change in consumer behavior and will bring a far-reaching impact on traditional industries of content, media, and communications.

The iSocial ITN aspires to bring a transformational change in OSN provision, pushing the state-of-the-art from centralized services towards totally decentralized systems that will pervade our environment and seamlessly integrate with future Internet and media services. OSN decentralization can address privacy considerations and improve service scalability, performance and fault-tolerance in the presence of an expanding base of users and applications.

The main objective of iSocial is to provide world class training for a next generation of researchers, computer scientists, and Web engineers, emphasizing on a strong combination of advanced understanding in both theoretical and experimental approaches, methodologies and tools that are required to develop DOSN platforms.

A Community Networking Cloud in a Box (CLOMMUNITY)

FP7 EU-project, 2013-2015

Community networking is an emerging model for the Future Internet across Europe and beyond where communities of citizens can build, operate and own open IP-based networks, a key infrastructure for individual and collective digital participation.

The CLOMMUNITY project aims at addressing the obstacles for communities of citizens in bootstrapping, running and expanding community-owned networks that provide community services organised as community clouds. That requires solving specific research challenges: self-managing and scalable (decentralized) infrastructure services for the management and aggregation of a large number of widespread low-cost unreliable networking, storage and home computing resources; distributed platform services to support and facilitate the design and operation of elastic, resilient and scalable service overlays.

Historical Projects (2000-2010)

Peer-to-Peer Live Streaming (PeerTV)

Vinnova, 2007-2010

The PeerTV project developed, deployed and validated peer-to-peer media streaming platforms addressing three key requirements: (i) efficient utilization of upload bandwidth at peers to reduce centrally provisioned bandwidth, (ii) reducing playback latency and increasing playback continuity through novel topologies, and (iii) minimizing network traffic cost for ISPs through AS-aware infrastructure.

SELFMAN

FP6 EU-project, 2006-2009

The goal of SELFMAN is to make large-scale distributed applications that are self managing, by combining component models and structured overlay networks. Self management along four axes: self configuration, self healing, self tuning, and self protection.

PEPITO

FP5 EU-project, 2002-2004

Peer-To-Peer-Implementation-and-TheOry. Traditional centralised system architectures are ever more inadequate. The PEPITO project investigated completely decentralised models of P2P computing, both how to build them robustly and what can be built.

Information Cities (ICities)

EU-project, 2000-2003

The Information Cities project models aggregation and segregation patterns in a virtual world of infohabitants (humans, virtual firms, on-line communities and software agents). The objective is to capture aggregate patterns of virtual organisation emerging from interaction over information infrastructure.

EVERGROW

European Research Project on the Future Internet

The goal of EVERGROW is to build the science-based foundations for the global information networks of the future. The demands on the future Internet will be high, and a number of today's highly manual processes must be automated, such as network management, network provisioning and network repair on all levels.

CoreGRID

EU Network of Excellence

The CoreGRID Network of Excellence aims at strengthening and advancing scientific and technological excellence in Grid and Peer-to-Peer technologies. The Network brings together 161 permanent researchers and 164 PhD students from forty-one institutions in six complementary research areas to develop next generation Grid middleware.

Awards and Honors

Year Award Organization For
2023 ACM SIGMOD Systems Award ACM SIGMOD Apache Flink - expanding stream data-processing
2019 European Data Science Technology Innovation EIT Digital LogicalClocks and Hopsworks platform
2017 IEEE Scale Prize IEEE HopsFS - most scalable HDFS filesystem
2010 IEEE Scalability Prize IEEE Scalaris - transactional key-value store
1991 Xerox Chester Carlson Research Prize Royal Swedish Academy (IVA) Logic programming and parallel processors

Book

Concepts, Techniques, and Models of Computer Programming

Authors: Peter Van Roy and Seif Haridi
Publisher: MIT Press, 2004
Pages: 936
ISBN: 978-0262220699

This comprehensive textbook presents computer programming as a unified discipline. It teaches all major programming paradigms in a uniform framework that shows their deep relationships.

Paradigms Covered

  • Declarative programming
  • Message-passing concurrency
  • Object-oriented programming
  • Shared-state concurrency
  • Constraint programming
  • Distributed programming

"In almost 20 years since Abelson and Sussman revolutionized the teaching of computer science with their Structure and Interpretation of Computer Programs, this is the first book I've seen that focuses on big ideas and multiple paradigms, as SICP does, but chooses a very different core model (declarative programming)."

- Brian Harvey, UC Berkeley

The book uses the Mozart Programming System (implementing the Oz language) as its main vehicle for teaching and experimentation.

Companies Founded

LogicalClocks / Hopsworks

Role: Co-founder, Chief Scientist
Founded: 2016

Enterprise data platform for scale-out data science and AI. Won European Data Science Technology Innovation 2019.

hopsworks.ai

HiveStreaming

Role: Co-founder

Enterprise Content Delivery Network (eCDN) for internal live video events using P2P-based video distribution.

hivestreaming.com

Peerialism

Role: Advisor

P2P data transfer system. Named one of Sweden's 33 hottest companies (2009). Acquired for 100M SEK.

Co-founder: Ali Ghodsi (now Databricks CEO)

PhD Students

Notable Alumni

Ali Ghodsi

PhD 2006: Distributed k-ary Systems

Current: CEO of Databricks (valued at $100B+)

Joe Armstrong

PhD 2003: Reliable Distributed Systems

Legacy: Creator of Erlang (powers WhatsApp, Ericsson)

Paris Carbone

PhD 2018: Data Stream Processing

Current: Apache Flink core contributor

Complete List (25+ PhD Students)

Name Year Thesis Topic
Paris Carbone2018Scalable and Reliable Data Stream Processing
Fatemeh Rahimian2014Gossip-Based Algorithms for Information Dissemination
Raul Jimenez2013Distributed Peer Discovery in P2P Systems
Roberto Roverso2013Adaptive HTTP-live Streaming on P2P Overlays
Amir H. Payberah2013Live Streaming in P2P and Hybrid Environments
Cosmin Arad2013Reconfigurable Distributed Systems
Tallat Shafaat2013Partition Tolerance in Overlay Networks
John Ardelius2013On the Performance Analysis of Large Scale, Dynamic, Distributed and Parallel Systems
Ali Ghodsi2006Distributed k-ary System (DHT Algorithms)
Erik Klintskog2005Distribution Support for Programming Systems
Sameh El-Ansary2005Structured Peer-To-Peer Systems
Per Brand2005Distributed Programming Systems (Mozart)
Joe Armstrong2003Reliable Distributed Systems (Erlang)
Ashley Saulsbury1999Latency in Distributed Memory Systems
Johan Montelius1997Fine-grain Parallelism
Björn Carlsson1995Compiling and Executing Finite Domain Constraints
Sverker Janson1994AKL Multiparadigm Programming Language
Torbjörn Keisu1994Tree Constraints
Erik Hagersten1992Cache-Only Memory Architectures
Roland Karlsson1992A High Performance OR-parallel Prolog System
Dan Sahlin1991Partial Evaluator for Prolog
Mats Carlsson1990OR-parallel Prolog Engine
Nabiel El Shiewy1990Robust Coordinated Reactive Computing in SANDRA
Bogumil Hausman1990Pruning and Speculative Work in OR-Parallel Prolog

Selected Publications

Seminal Works

  • "Concepts, Techniques, and Models of Computer Programming" (2004) - MIT Press, 936 pages
  • "A history of the Oz multiparadigm language" (2020) - HOPL IV, ACM
  • "Apache Flink: Stream and Batch Processing in a Single Engine" (2015) - IEEE Data Engineering Bulletin
  • "State Management in Apache Flink" (2017) - VLDB Endowment
  • "HopsFS: Scaling Hierarchical File System Metadata" (2017) - FAST
  • "Efficient Logic Variables for Distributed Computing" (1999) - ACM TOPLAS
  • "Mobile Objects in Distributed Oz" (1997) - ACM TOPLAS
  • "DDM - A Cache-Only Memory Architecture" (1992) - IEEE Computer

Academic Profiles

Invited Talks

  • HOPL IV 2021: "The History of Oz: A Multiparadigm Programming Language"
  • Boston University 2023: Distinguished CS Colloquium
  • University of Chicago: "Research in Continuous Deep Analytics"

Teaching

For more than ten years, Seif Haridi has been teaching popular courses on Distributed Algorithms and Peer-to-Peer Computing at KTH.

Online Courses (edX)

YouTube

KTH Courses

  • ID2220: Advanced Topics in Distributed Systems
  • ID2203: Distributed Systems, Advanced Course
  • ID2210: Distributed Computing, Peer-to-Peer and GRIDS
  • 2G1126: Distributed Computer Systems (historical)
  • 2G1512: Computer Science II (historical)

Contact

@
Email
haridi@kth.se
@
RISE Email
seif.haridi@ri.se
T
Phone
+46 8 790 41 22
O
Office
Electrum 229, Kista, Sweden

Web Profiles