## Winter School on Machine Learning – WISMAL 2019

**Organisation**

Nicolai Petkov, program director

Nicola Strisciuglio, publicity

Carlos Travieso, local organisation

The Winter School on Machine Learning will take place on **10-12 January 2019**.

### Download the final program here

This short winter school consists of several tutorials that present different techniques of Machine Learning. Each tutorial is two hours, introducing the main theoretical concepts and giving typical practical applications and their implementation in popular programming environments.

The participation in the winter school is free of charge for registered participants in APPIS 2019. The number of participants in the winter school is limited to 100 and early registration is encouraged to secure a place in the winter school.

The fee for participation is **250 Euro** (before 15 December 2018, and **325 Euro after 15 December**), and it includes free registration to APPIS 2019.

### Registration

Registrations to the Winter School on Machine Learning are open and can be done at the url:

http://fpctserver.upe.ulpgc.es/fpctcursos/matricula.php?curso=FP-0088

IMPORTANT: Please register as participant to APPIS and, after confirming your email address, indicate in the second step of the registration form if you ar going to participate to WISMAL and APPIS or only to WISMAL.

### Program

**Prototype-based machine learning**–*Michael Biehl***Clustering**–*Kerstin Bunte***Tree-based ensemble learning**–*Ahmad Alsahaf***Convolutional Neural Networks and Deep Learning**–*Maria Leyva, Nicolai Petkov, Nicola Strisciuglio***Recurrent Neural Networks: From Universal Function Approximators to Big Data Tool**–*Danilo Mandic***Deep Learning in the Wolfram Language**–*Markus van Almsick, Algorithms R&D, Wolfram Research***Consensus Learning**–*Xiaoyi Jiang*

*Further tutorials may be added to the program soon.*

### Sponsors

### Abstract

**Prototype-based machine learning – Michael Biehl**

An overview is given of prototype-based systems in machine learning.

In this framework, observations, i.e. the data, are represented in terms of typical representatives. Together with a suitable measure of similarity, such systems can be employed in the context of unsupervised and supervised analysis of potentially high dimensional, complex data sets.

Prototype-based systems offer great flexibility, are easy to implement, and can be interpreted easily in the context of a given application.

Example schemes of unsupervised Vector Quantization will be introduced, including Kohonen’s Self-Organizing Maps (SOM). Supervised learning in prototype systems will be discussed mainly in terms of Learning Vector Quantization (LVQ) and its variants. In particular, the essential role of appropriate dis-similarity measures will be addressed and the concept of adaptive distances in relevance learning will be presented.

The presented concepts and methods will be illustrated in terms of benchmark problems and selected real world applications.

**Clustering – Kerstin Bunte**

With modern digitalisation and sensor technology the amount of data is increasing every year.

Tools for unsupervised exploratory data analysis are highly desirable to find interesting patterns.

One example concept is the task of clustering, which aims in finding groups of objects that are more similar to each other than those belonging to other groups.

Generally this unsupervised grouping is an ill-posed problem facing non-trivial questions such as:

- What would be a good cluster?
- What is a suitable definition of similarity? and
- How many clusters are present?

A vast amount of cluster analysis techniques have emerged and exemplary methods from hierarchical clustering, prototype-based and density-based clustering will be introduced and discussed.

The presented concepts and methods will be illustrated in terms of benchmark problems and selected real world applications.

**Tree-based ensemble learning – Ahmad Alsahaf**

Tree ensembles are among the most widely used algorithms for supervised learning. Such algorithms combine a large number of decision-trees in a single model to achieve better predictions than those achieved by the constituent trees.

This tutorial will present as overview of the different ways tree-ensembles are constructed, namely bagging and boosting. Moreover, practical advice on implementation, parameter tuning, and model interpretation will be given, using selected real world datasets.

Examples of the methods that will be discussed include: Random forest, AdaBoost, Gradient boosted trees, and recently developed methods such as XGBoost, and LightGBM.

**Convolutional Neural Networks and Deep Learning – Maria Leyva, Nicolai Petkov, Nicola Strisciuglio**

** **

Starting with a brief historical review of the development of artificial neural networks (perceptron, multi-layer feed forward networks) we arrive at convolutional neural networks (CNN). A CNN for image or video processing is a pipeline of layers in which each stage typically includes a convolution, half-wave rectification (ReLu) and max-pooling. An input 3D array (e.g. a colour image) is transformed into a sequence of 3D arrays. In each such array, two indexes correspond to image coordinates and the third index corresponds to some feature computed by the network. Typically, with progression through the layers, the extent of the image coordinates decreases while the number of features increases. The image representation that is computed at the end of this pipeline can be used for classification or regression. The training is achieved by modification of the convolution coefficients by error back propagation.

We review architectures for image classification (e.g. LeNet, VGGNet and ResNetX) and for encoding/decoding of images. We show examples of succesfull applications such as large-scale image classification, (medical) image segmentation and medical diagnostics. We demonstrate how CNNs can be used in popular programming environments such as PyTorch and show how to train and test networks for classification and regression.

**Recurrent Neural Networks: From Universal Function Approximators to Big Data Tool – Danilo Mandic**

Recurrent neural networks (RNN) are learning machines which use memory in a very effective way. Of special interest are RNNs for temporal processes, where RNNs exhibit real-time adaptibility, ability to model infinited impulse responses of systems, and to deal with nonlinear signals. This tutorial will start from their interpretation from the system theory perspective, as universal function approximators which also model truncated Volterra systems. Next, architectures and learning algorithms will be presented for fully connected RNNs, and their application for time series prediction will be discussed in detail. Complex-valued and quaternion-valued RNNs will be introduced for the amplitude-phase modelling of 2D and 4D processes, together with the corresponding calculi, statistics, and learning algorithms. A brief overview Echo State Networks (ESN) will establish their role among learning machines. Finally, deep RNNs and their connection with tensor networks for Big Data analysis will conclude the talk.

Literature:

*[1] D. P. Mandic and J. A. Chambers, Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures, and Stability, research monograph, John Wiley & Sons, 2001.*

*[2] D. P. Mandic and S. L. Goh, Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models, research monograph, John Wiley & Sons, 2009.*

*[3] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and D. P. Mandic, “Tensor networks for dimensionality reduction and large-scale optimisation. Part 1: Low-rank tensor decompositions”, Foundations and Trends in Machine Learning, vol. 9, no. 4-5, pp. 249-429, 2016.*

*[4] A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. Oseledets, M. Sugiyama, and D. P. Mandic, “Tensor networks for dimensionality reduction and large-scale optimization. Part 2: Applications and future perspectives”, Foundations and Trends in Machine Learning, vol. 9, no. 6, pp. 431-673, 2017.*

**Deep Learning in the Wolfram Language – Markus van Almsick, Algorithms R&D, Wolfram Research**

*Short bio:*

Markus van Almsick received his PhD in Biomedical Image Processing from the Technical University of Eindhoven, the Netherlands. He has been a member of the Theoretical Biophysics group at the Beckman Institute for Advanced Science and Technology at the University of Illinois and he has worked for the Max Planck Institute for Biophysics in Frankfurt, Germany. Since 1988 he is a consultant for Wolfram Research and part of the image processing and computer vision team since 2009.

**Consensus Learning – Xiaoyi Jiang**

Consensus problems in various forms have a long history in computer science. In pattern recognition, for instance, there are no context- or problem-independent reasons to favor one classification method over another. Therefore, combining multiple classification methods towards a consensus decision can help compensate the erroneous decisions of one classifier by other classifiers. Practically, ensemble methods turned out to be an effective means of improving the classification performance in many applications. In general, this principle corresponds to combining multiple models into one consensus model, which helps among others reduce the uncertainty in the initial models. Consensus learning can be formulated and studied in numerous problem domains; ensemble classification is just one special instance. This tutorial will present an introduction to consensus learning. In particular, the focus will be the formal framework of so-called generalized median computation, which is applicable to arbitrary domains. The concept of this framework, theoretical results, and computation algorithms will be discussed. A variety of applications in pattern recognition and other fields will be shown to demonstrate the power of consensus learning.