*August 27th, 2016*

MOD 2016 Industrial Session aims to bring together participants from academia and industry in a venue that highlights practical and real-world studies of machine learning, optimization and data science.

The ultimate goal of this event is to encourage mutually-beneficial exchange between scientific researchers and practitioners working to improve data science analytics.

The session will consist of a series of invited presentations from leading experts in industry on selected topics in machine learning, optimization and data science from industry perspective and with a special focus on real-world applications.

The session will then continue with a panel on the future research challenges and opportunities in the field.

All participants will be encouraged to share and discuss novel ideas, controversial issues, open problems and comparisons of competing approaches.

Experiences from practitioners will provide crucial input into future research directions.

**Panelists:**

Title TBA

*Amr Awadallah, Founder and CTO at Cloudera, San Francisco, USA*

“* Big Data based Online Advertising*”

“* Data Fellas, Agile Data Science*”

Data Science, a buzz word we’ve seen popping everywhere in 2015. Why? It turns out engineers explored the Big Data’s value and the way to deal it, that is, digging the gold, Data Science, covering mathematics, statistics, machine learning, data preparation, software development and more. Data science came to the front because data is accumulating and exploiting the value is a key to competitivity. Data Science and Machine Learning in particular had traditionally been the smart and helpful tool mostly designed and developed in academia, the enterprise could only grasp at high premium. Now the game is changing drastically, methods have matured, libraries are available and more data scientists are entering the market. Still, there are many friction points in the development process of services exploiting data. It’s true that Data Scientists are developers, but usually they are not software developers and even less devops which leads to a disrupted organization and a lack of efficiency. In this talk, Data Fellas presents his vision of an unified environment starting with the Spark Notebook, helping different people with different tasks and background to develop a data service pipeline with minimal friction and maximal agility.

“*Good City Life*”

**Daniele Quercia, Head of Social Dynamics group at Bell Labs, Cambridge, UK**

“*On the Behavior of Deviant Communities in Online Social Networks*”

**Fabrizio Silvestri, Facebook**

Moderator: *Donato Malerba, University of Bari, Italy*

Session Chairs: TBA

Contact: TBA

*“Model Selection and Error Estimation Without the Agonizing Pain”*

*Luca Oneto and Davide Anguita, DIBRIS – Polytechnic School – University of Genova, Italy*

Some infamous Data Science failures, like the 2013 Google Flu Trends misprediction, reveal that large volumes of data are not enough for building effective and reliable predictive models. Even when huge datasets are available, Data Science needs Statistics in order to cope with the selection of optimal models and the estimation of their quality. In particular, Statistical Learning Theory (SLT) addresses these questions by deriving non-asymptotic bounds on the generalization error of a model or, in other words, by upper bounding the true error of the learned model based just on quantities computed on the available data. However, for a long time, SLT has been considered only an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this tutorial is to give an intelligible overview of the problems of Model Selection and Error Estimation, by focusing on the ideas behind the different SLT-based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice, with particular reference to Big Data problems. We will start by presenting the seminal works of the 80’s until the most recent results, then discuss open problems and nally outline future directions of this field of research.

*“Information Geometry: Applications in Machine Learning and Stochastic Optimization”*

*Luigi Malagò, Department of Electrical and Electronic Engineering, Shinshu University, Nagano, Japan*

Information Geometry is an interdisciplinary and expanding research field at the intersection of statistics, information theory and differential geometry, where statistical models are represented as manifolds of probability distributions. One of the most relevant contributions of Information Geometry is given by the definition of novel principles for the design of effective algorithms in machine learning and stochastic optimization, for the optimization of functions defined over statistical models. From this perspective, the most notable example is the natural gradient, with has a large record of successful applications in many heterogeneous contexts. The aim of this tutorial is to present to a broad range of researchers the theory of Information Geometry, with particular emphasis on the geometry of the optimization over statistical models. First and second-order optimization methods will be discussed according to the theory of the optimization over manifolds, together with the importance and the advantages of taking into account the proper geometry of the search space. The tutorial will review from a unifying perspective the most popular applications of natural gradient methods in different fields in machine learning. Moreover, it will focus on the most recent advances in the theory of Information Geometry, which are expected to inspire the design of novel algorithms, and that in some cases could even contribute to open novel lines of research. Information geometry reached its maturity with the work of Amari and co-workers, who mostly contributed to the development of this research field starting from the 80s, even if it was probably Hotelling (1930), and later Rao (1945), who first noticed that the geometry of statistical models is not Euclidean. Indeed, Information geometry represents statistical models as differentiable manifolds, according to the standard definition from Riemannian geometry. From this perspective a parameterization for a statistical model defines a coordinate system over the corresponding statistical manifold, and the Fisher information metric corresponds to the unique metric invariant to reparameterization. In machine learning, statistical inference, and in stochastic optimization, one is often faced with the task of optimizing a function whose variables are the parameters of a statistical model. Consider for example the optimization of the expected value of a function with respect to a distribution in a statistical model, the maximization of the likelihood, or more in general the minimization of a loss function. Whenever a closed-formula for the solution of these problems is not available, or computationally unfeasible, gradient descent methods, or analogously second-order techniques, constitute a consolidated approach to optimization. However, the gradient of a function on a manifold, and similarly the Hessian, strongly depend on the geometry of the search space, and in particular on the chosen metric, which implies that the direction of maximum decrement does not correspond to the vector of partial derivatives. This observation led to the proposal of the natural gradient (Amari, 1998), which corresponds to the Riemannian gradient of a function defined over a statistical manifold, evaluated with respect to the Fisher information metric. It is a well-

known result that natural gradient guarantees better performance compared to the Euclidean gradient: it is invariant to reparameterization of the statistical model, it shows faster convergence rates, and it is able to speed up over plateaux. The machine learning and stochastic optimization communities mainly recognized the advantages of the approaches based on Riemannian geometry in the design of gradient-descent methods over statistical models, as it is attested by the large and increasing number of applications in different fields. In the first part of the tutorial the theoretical framework of Information Geometry will be reviewed, with rigorous mathematical definitions accompanied by geometric intuitions, to make the presentation appropriate for a general audience. After the introduction of the fundamental concepts of the Riemannian geometry of a statistical model, the presentation will focus on two alternative and equivalently important geometries for the set of probability distributions: the exponential and mixture ones, defined by the family of dual affine connections, introduced by Amari. The presentation will be focused on the design and analysis of first and second-order optimization methods over statistical models, with special regard to models in the exponential family and in particular the Gaussian distribution. The tutorial will make continuous references to the well-

established theory of optimization over manifolds (Absil et. al., 2008), which will be adapted to the special case of dually-flat manifolds studied in Information Geometry. The second part of the tutorial will review, from a unifying perspective, different applications of Information Geometry in machine learning and stochastic optimization, in particular those based on natural gradient algorithms. These include successful and often state-of-the-art algorithms in reinforcement learning and natural policy learning, training of neural networks and deep learning, Bayesian optimization, Bayesian variational inference, stochastic relaxation, i.e., the optimization of a function by the minimization of its expected value, Hamiltonian Monte Carlo methods, and other related techniques. Finally, in the last part, the tutorial will discuss some recent advances in Information Geometry, such as novel and efficient formulae for the computation of the natural gradient over low-dimensional sub-models in the Gaussian distribution, for large dimensional sample spaces; and the proper definition of the Riemannian Hessian of a function over a statistical model, which is at the basis of the design of a variegate family of second-order optimization algorithms.

*Interactive activity and demos:*

The tutorial will feature some live demos, aimed at showing the advantages of natural gradient and Riemannian second-order methods compared to the Euclidean counterparts, over simple benchmark tasks, in different domains in machine learning and stochastic optimization. The source code of the demos will be released before the tutorial.

*Potential target participants and audience:*

The tutorial will be intended for, but not limited to, those researchers working in machine learning, stochastic optimization, and related domains, which are faced with optimization problems defined over statistical models. Classical examples of applications include reinforcement learning and natural policy learning, training of neural networks, deep learning, Bayesian optimization, Bayesian variational inference, stochastic relaxation, Hamiltonian Monte Carlo methods, and related techniques in vision and robotics. More in general, the tutorial will be of interests to people with interests in optimization over manifolds, such as matrix manifolds and in particular over the cone of positive-semidefinite matrices.

**Deep Learning: Theory, Architectures, Algorithms and Applications**

Learning multi-level representations of data

and

learning from very large amounts of data.

**Data-driven Algorithmics – Data for informed decisions**

Data-driven algorithmics is an emerging topic that requires synthesis of prediction tools from machine learning with algorithms from theoretical computer science.

Topics: information theory in algorithm design, deep learning paradigms for data-driven optimization, convex optimization, generative models, barriers to implementation of algorithms in practice, new paradigms for balancing online and batch learning, space and precision tradeoffs, large-scale machine learning tasks.

**Multi-Objective Optimization Algorithms**

TBA