## Winter School on Machine Learning – WISMAL 2020

**Organisation**

Nicolai Petkov, program director

Nicola Strisciuglio, publicity

Carlos Travieso, local organisation

The Winter School on Machine Learning will take place on **10-12 January 2020**.

This short winter school consists of several tutorials that present different techniques of Machine Learning. Each tutorial is two hours, introducing the main theoretical concepts and giving typical practical applications and their implementation in popular programming environments.

The participation in the winter school is free of charge for registered participants in APPIS 2020. The number of participants in the winter school is limited to 100 and early registration is encouraged to secure a place in the winter school.

The fee for participation is **250 Euro** (before 13th December, and **325 Euro after 13th December**), and it includes free registration to APPIS 2020.

### Registration

*Registrations will open soon*

Registrations to the Winter School on Machine Learning are open and can be done at the url:

https://fpctserver.upe.ulpgc.es/fpctcursos/matricula.php?curso=FP-0118

IMPORTANT: Please register as participant to APPIS and, after confirming your email address, indicate in the second step of the registration form if you are going to participate to WISMAL and APPIS or only to WISMAL.

### Program

**Neural Network Design in the Wolfram Language**–*Markus van Almsick, Algorithms R&D, Wolfram Research***Prototype-based machine learning**–*Michael Biehl, University of Groningen***Clustering**–*Kerstin Bunte, University of Groningen***Multi-target prediction**–*Willem Waegeman, University of Gent***Convolutional Neural Networks and Deep Learning**–*Maria Leyva, Nicolai Petkov (University of Groningen), Nicola Strisciuglio (University of Groningen – University of Twente)***Recurrent Neural Networks: From Universal Function Approximators to Big Data Tool**–*Danilo Mandic, Imperial College London***Consensus learning**–*Xiaoyi Jiang, University of Muenster*

*Further tutorials may be added to the program soon.*

### Sponsors

### Abstract

**Prototype-based machine learning – Michael Biehl**

An overview is given of prototype-based systems in machine learning.

In this framework, observations, i.e. the data, are represented in terms of typical representatives. Together with a suitable measure of similarity, such systems can be employed in the context of unsupervised and supervised analysis of potentially high dimensional, complex data sets.

Prototype-based systems offer great flexibility, are easy to implement, and can be interpreted easily in the context of a given application.

Example schemes of unsupervised Vector Quantization will be introduced, including Kohonen’s Self-Organizing Maps (SOM). Supervised learning in prototype systems will be discussed mainly in terms of Learning Vector Quantization (LVQ) and its variants. In particular, the essential role of appropriate dis-similarity measures will be addressed and the concept of adaptive distances in relevance learning will be presented.

The presented concepts and methods will be illustrated in terms of benchmark problems and selected real world applications.

**Convolutional Neural Networks and Deep Learning – Maria Leyva, Nicolai Petkov, Nicola Strisciuglio**

** **

Starting with a brief historical review of the development of artificial neural networks (perceptron, multi-layer feed forward networks) we arrive at convolutional neural networks (CNN). A CNN for image or video processing is a pipeline of layers in which each stage typically includes a convolution, half-wave rectification (ReLu) and max-pooling. An input 3D array (e.g. a colour image) is transformed into a sequence of 3D arrays. In each such array, two indexes correspond to image coordinates and the third index corresponds to some feature computed by the network. Typically, with progression through the layers, the extent of the image coordinates decreases while the number of features increases. The image representation that is computed at the end of this pipeline can be used for classification or regression. The training is achieved by modification of the convolution coefficients by error back propagation.

We review architectures for image classification (e.g. LeNet, VGGNet and ResNet) and for encoding/decoding of images. We show examples of succesfull applications such as large-scale image classification, (medical) image segmentation and medical diagnostics. We demonstrate how CNNs can be used in popular programming environments such as PyTorch and show how to train and test networks for classification and regression.

**Recurrent Neural Networks: From Universal Function Approximators to Big Data Tool – Danilo Mandic**

Recurrent neural networks (RNN) are learning machines which use memory in a very effective way. Of special interest are RNNs for temporal processes, where RNNs exhibit real-time adaptibility, ability to model infinited impulse responses of systems, and to deal with nonlinear signals. This tutorial will start from their interpretation from the system theory perspective, as universal function approximators which also model truncated Volterra systems. Next, architectures and learning algorithms will be presented for fully connected RNNs, and their application for time series prediction will be discussed in detail. Complex-valued and quaternion-valued RNNs will be introduced for the amplitude-phase modelling of 2D and 4D processes, together with the corresponding calculi, statistics, and learning algorithms. A brief overview Echo State Networks (ESN) will establish their role among learning machines. Finally, deep RNNs and their connection with tensor networks for Big Data analysis will conclude the talk.

Literature:

*[1] D. P. Mandic and J. A. Chambers, Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures, and Stability, research monograph, John Wiley & Sons, 2001.*

*[2] D. P. Mandic and S. L. Goh, Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models, research monograph, John Wiley & Sons, 2009.*

*[3] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and D. P. Mandic, “Tensor networks for dimensionality reduction and large-scale optimisation. Part 1: Low-rank tensor decompositions”, Foundations and Trends in Machine Learning, vol. 9, no. 4-5, pp. 249-429, 2016.*

*[4] A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. Oseledets, M. Sugiyama, and D. P. Mandic, “Tensor networks for dimensionality reduction and large-scale optimization. Part 2: Applications and future perspectives”, Foundations and Trends in Machine Learning, vol. 9, no. 6, pp. 431-673, 2017.*

**Neural Network Design in the Wolfram Language**** – Markus van Almsick, Algorithms R&D, Wolfram Research**

Mathematica and the Wolfram Language form a comprehensive software for mathematical and scientific calculations. The computer language contains a high-performance neural network framework with CPU and GPU support. Constructing and training networks often requires only a few lines of code, putting deep learning in the hands of even non-expert users.

The tutorial will provide a short introduction to the above mentioned neural network framework. We then utilize the software to introduce, experiment and discuss the merits and drawbacks of classical as well as advanced neural network designs ranging from convolutional neural networks to the more recent capsule networks.

Neural network design depends on the type of data being processed. This tutorial will focus on image data and applications thereof. However, if time permits, audio and natural language processing (NLP) will be considered as well.

To obtain a free Mathematica demo version for the tutorial follow the link https://www.wolfram.com/mathematica/trial/

Short bio:

Markus van Almsick studied physics at the Technical University of Munich and received his PhD in Biomedical Image Processing from the Technical University of Eindhoven, the Netherlands. He has been a member of the Theoretical Biophysics group at the Beckman Institute for Advanced Science and Technology at the University of Illinois and he has worked for the Max Planck Institute for Biophysics in Frankfurt, Germany. Since 1988 he has been a consultant for Wolfram Research and for the last 10 years he has been a member of their image processing and computer vision team.

**Clustering – Kerstin Bunte**

With modern digitalisation and sensor technology the amount of data is increasing every year.

Tools for unsupervised exploratory data analysis are highly desirable to find interesting patterns.

One example concept is the task of clustering, which aims in finding groups of objects that are more similar to each other than those belonging to other groups.

Generally this unsupervised grouping is an ill-posed problem facing non-trivial questions such as:

- What would be a good cluster?
- What is a suitable definition of similarity? and
- How many clusters are present?

A vast amount of cluster analysis techniques have emerged and exemplary methods from hierarchical clustering, prototype-based and density-based clustering will be introduced and discussed.

The presented concepts and methods will be illustrated in terms of benchmark problems and selected real world applications.

**Consensus Learning – Xiaoyi Jiang**

Consensus problems in various forms have a long history in computer science. In pattern recognition, for instance, there are no context- or problem-independent reasons to favor one classification method over another. Therefore, combining multiple classification methods towards a consensus decision can help compensate the erroneous decisions of one classifier by other classifiers. Practically, ensemble methods turned out to be an effective means of improving the classification performance in many applications. In general, this principle corresponds to combining multiple models into one consensus model, which helps among others reduce the uncertainty in the initial models. Consensus learning can be formulated and studied in numerous problem domains; ensemble classification is just one special instance. This tutorial will present an introduction to consensus learning. In particular, the focus will be the formal framework of so-called generalized median computation, which is applicable to arbitrary domains. The concept of this framework, theoretical results, and computation algorithms will be discussed. A variety of applications in pattern recognition and other fields will be shown to demonstrate the power of consensus learning.

**Multi-target prediction – Willem Waegeman**

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this tutorial, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

The tutorial intends to cover an overview of existing methods, while focussing on cross-domain methodologies. With the tutorial we aim to attract both researchers that are already active in one of the above domains, as well as researchers with little or no prior experience in multi-target prediction.