Daniela Grigori is a Professor at University Paris-Dauphine since september 2011. Previously she was an associate professor at the University of Versailles (2002 - 2011).
Her current research interests include web services, process management, business process intelligence, graph databases. She has a number of papers in international conferences and journals and has served as organizer and program committee member in many conferences. She is co-author of a book on process analytics (
Springer, 2016).
Business process analytics
Les processus métiers sont inséparables de données : données
d'exécution des processus, documentation et description des
processus, modèles et versions des processus, artefacts et données
générées ou échangées pendant l'exécution du processus. Ces données
peuvent être sous diverses formes : structurées, semi-structurées ou
non structurées. Une variété d’outils de capture, collecte de données
et la mise en œuvre de processus dans différents systèmes ont
amplifié la quantité de données qui sont disponibles autour des
processus.
Pour améliorer la qualité des services offerts et être compétitives,
un problème central pour les entreprises est l'identification, la
mesure, l'analyse et l'amélioration de leurs processus. Ce tutoriel
est une introduction aux concepts, méthodes et techniques permettant
l’analyse des données de processus.
Fast data analytics for time series and other ordered data
The relational model
is based on a single data type
and a few operations: unordered tables
which can be selected, projected, joined, and aggregated.
This model is in fact unnecessary
for simplicity and needlessly limits the expressive power,
making it difficult to express query on ordered data such as time
series data and other sequence data.
This talk presents a language for expressing ordered queries,
optimization techniques and performance results.
The talk goes on to
present experiments comparing the system against other popular data analytic sys
tems including Sybase IQ, Python's popular Pandas library and MonetDB using a
variety of benchmarks including the ones that those systems use themselves.
On the same hardware, our system is faster.
Approximation and randomized algorithms
Numerous data management tasks are intractable: it is NP-hard, or worse,
to determine the exact answer to those tasks. This needs not be the end of
the story: in many cases, it is actually possible to obtain an
approximation of the answer, with certain guarantees. Deterministic
approximation algorithms provide a way to efficiently approximate, within
a certain factor, the answer to some intractable problems. Randomized
approximation algorithms provide a probabilistic approximation guarantee.
In this lecture, we will review some basic approximation algorithms, see
cases where hardness of approximation itself can be shown, and illustrate
how randomized approximation algorithms (from naive Monte Carlo sampling
to more elaborate polynomial-time approximation schemes) can be used in
data management applications.
Pierre Senellart
Pierre Senellart is a Professor in the DBWeb team at Télécom ParisTech,
and a Senior Research Fellow in the Department of Computer Science of the
National University of Singapore, within the IPAL laboratory. He is an
alumnus of the École normale supérieure and obtained his M.Sc. (2003) and
his Ph.D. (2007) in computer science from Université Paris-Sud, studying
under the supervision of Serge Abiteboul. He was awarded an Habilitation
à diriger les recherches in 2012 from Université Pierre et Marie Curie.
Pierre Senellart has published numerous articles in internationally
renowned conferences and journals (PODS, AAAI, VLDB Journal, Journal of
the ACM, etc.) He has been a member of the program committee and
participated in the organization of various international conferences and
workshops (including PODS, WWW, VLDB, SIGMOD, ICDE). His research
interests focus around practical and theoretical aspects of Web data
management, including Web crawling and archiving, Web information
extraction, uncertainty management, Web mining, and querying under access
limitations.
Nicolas Anciaux
Nicolas Anciaux is a researcher at INRIA, in the SMIS project which
focuses on secured and mobile information systems. He received his PhD
in 2004 and his French Habilitation in 2014 from University of
Versailles. Before joining Inria, Nicolas was a post-doc researcher at
University of Twente and studied database implementations of the
"right-to-be-forgotten" in ambient intelligent environments. Since he
joined INRIA in 2006, his research interests lie in the area of data
management on specific hardware architecture, and more precisely on
secure chips and embedded systems. He proposes architectures using
secure hardware, and data structures and algorithms to manage personal
data with strong privacy guarantees. Nicolas co-authored around 30
research articles. He is a co-designer of PlugDB
(https://project.inria.fr/plugdb/), a secure and personal database
device. Since 2015, Nicolas co-animates the research activities of the
privacy cluster of the Digital Society Institute which brings together
economists, jurists and computer scientists.
This tutorial will discuss existing cloud based architectures for
personal data management
and will propose some alternatives, based on decentralization and
secure devices.
Parallel pattern mining
Pattern mining is a task of data mining focusing on extracting regularities from data.
It is extremely computationally intensive, making it a good candidate for exploiting large parallel platforms.
However computation structure of pattern mining algorithms is mostly irregular, so parallelizing these algorithms is non-trivial.
We will present several successful approaches for parallelizing pattern mining algorithms that allow them to benefit from parallel platforms, either multicore processors or distributed platforms.
We will focus on flexible pattern mining algorithms that allow the user to tailor the definition of patterns to their needs.
Alexandre Termier
Alexandre Termier is Professor of Computer Science at the University of Rennes 1 since 2014. He was before at University of Grenoble-Alpes (2007-2014). He is also the head of the LACODAM group at IRISA / INRIA.
His research is focused on pattern mining, especially for defining new, more useful and/or more efficient pattern methods.
His work are validated through industrial collaboration in various domains: embedded systems, retail or energy consumption.
Analyse de données massives
L'analyse statistique des données massives doit permettre de comprendre des phénomènes complexes et de prendre des décisions justifiées.
Ce tutoriel a pour objectif de fournir une vue d'ensemble de la modélisation descriptive et décisionnelle à partir de données.
La majeure partie sera consacrée aux différentes familles de problèmes de modélisation et à quelques méthodes couramment employées.
Nous examinerons ensuite le passage à l'échelle de ces méthodes et notamment l'exécution sur des plateformes distribuées.
L'après-midi, un atelier permettra de réaliser une analyse simple d'un jeu de données en utilisant des installations locales de Apache Spark.
Michel Crucianu
Michel Crucianu est professeur d'informatique au Conservatoire National des Arts et Métiers (Paris) depuis 2005. Ses travaux de recherche concernent notamment la fouille de grandes bases de données multimédia. Dans ce cadre il s'intéresse à l'apprentissage semi-supervisé et actif, à la construction de représentations, aux index multidimensionnels et métriques.
Travaux Pratiques sur l'analyse de données Massives
- Mécanismes d'exécution simples (type MapReduce) sur architectures distribuées
- Spark
- Analyse de Données
Luc Bouganim
Luc Bouganim is a Director of Research at Inria Saclay-Île de France and is the vice-head of SMIS (Secured and Mobile Information Systems) research team. He obtained a PhD and the Habilitation à Diriger des Recherches, both from the University of Versailles in 1996 and 2006, respectively. He worked as an assistant professor from 1997 to 2002 when he joined Inria. Since 2000, Luc is strongly engaged in research activities on database management on chip and on the protection of data confidentiality, using cryptographic techniques. More recently, Luc focused on flash memory and more precisely on its impact on DBMSs.
Luc co-authored more than 90 conference and journal papers, an international patent and was the recipient of 5 international awards.
Nicolas Terpolilli
Passionné d'Open Data depuis 5 ans maintenant, j'ai rejoins OpenDataSoft en tant que Chief Data Officer.
OpenDataSoft propose un outil qui permet à de plus en plus de clients de valoriser leurs données simplement et efficacement. Mon rôle est de faire en sorte qu'il y ait de plus en plus de données disponibles sur la plateforme, qu'elles circulent de plus en plus facilement et, surtout, qu'elles soient de plus en plus réutilisées.
Je suis diplômé de l'Ecole Centrale de Lille. Après avoir travaillé sur l'OpenData à Manchester, j'ai fait l'aller retour entre activités d'entrepreneur et de freelance avant de rejoindre OpenDataSoft à l'été 2015.
Initiatives Open Data et défis technologiques
Dans un monde où la création de valeur est de plus en plus distribuée et où l'intermédiation est la principale menace pour la plupart des organisations établies, l'Open Data s'impose comme une démarche efficace techniquement et économiquement.
L'objet de ce cours sera donc de faire une mise en perspective assez générale de l'économie de la données et ses mutations depuis l'avènement du numérique. De présenter comment l'Open Data s'inscrit dans cette dynamique. Puis d'expliquer très concrètement quel est le modèle d'affaire d'OpenDataSoft afin d'avoir un exemple très concret. Enfin se sera l'occasion d'évoquer les problématiques techniques ou les questions plus juridiques de licences.
Tristan Allard
Tristan Allard is currently holding an assistant professor position (MCF) at the University of Rennes 1. He conducted his PhD at the University of Versailles during which he worked on the sanitization of personal data distributed over secure personal data servers. Before joining the University of Rennes 1, he was a postdoc at Inria Montpellier, where he focused on the privacy-preserving clustering of personal time-series decentralized over personal devices. Now at Rennes 1, he continues designing privacy-preserving techniques for personal data management, in a variety of settings (crowdsourcing, cloud, peer-to-peer), by interwining encryption and sanitization schemes.
Privacy-Preserving Data Publishing : Where Are We Now?
The massive personal datasets collected by today's companies or institutions are valuable resources, both for the entities that hold them and for society at large. Privacy-preserving data publishing aims at opening personal datasets to large-scale analysis without jeopardizing the individuals privacy. The problem is hard, ranging from the definition of an adequate privacy criteria to the design of efficient and useful privacy algorithms. Ten years after the publications of the two seminal l-Diversity and Differential Privacy works, this lecture is a guided tour of the main privacy-preserving data publishing models and algorithms. We will synthesize the partition-based and differential privacy families of models and algorithms, analyze their strengths and weaknesses, and try to extract strong tendencies from the past decade.
Christophe Pradal
Christophe Pradal is a researcher at CIRAD and at INRIA, in the
VirtualPlants project, in Montpellier. He is the project leader of
OpenAlea, an international open source scientific workflow system. Prior
to that, he spent 4 years in the industry, at Dassault Systèmes
(1998-2002) during which he designed topological and geometrical
operators used in the automotive industry, aeronautical design and
shipbuilding. His research interests focus on scientific workflows,
reproductibility, computational modeling of plants morphogenesis, and
plant phenotyping.
Scientific Workflows
Analysing scientific data may involve very complex and interlinked steps
where several tools are combined together. Scientific workflows systems
have reached a level of maturity that makes them able to support the
design and execution of such in-silico experiments. They provide a
systematic way of describing the scientific and data methods, and
execute complex analysis on a variety of distributed resources.
In this lecture, we will review main features of scientific workflows
(representation,
composition, model of computation, execution, mapping), present
different workflows systems, and illustrate how algebraic scientific
workflow and provenance can enhance reproductibility in the analysis and
simulation of complex systems in biology.