class: hide-slide-number split-70 title-slide count: false .column.content[ <br> # Visual Diagnostics for Constrained Optimisation with Application to Guided Tours <br> .bottom_abs.width100[ <!-- ## H. Sherry Zhang --> <!-- Supervised by Dianne Cook, Ursula Laa, Nicolas Langrené, and Patricia Menéndez --> .monash-black[**H. Sherry Zhang**] .institute-size[.monash-gray80[**Monash**]] Dianne Cook .institute-size[.monash-gray80[Monash]] Ursula Laa .institute-size[.monash-gray80[BOKU]] Nicolas Langrené .institute-size[.monash-gray80[CSIRO Data61]] Patricia Menéndez .institute-size[.monash-gray80[Monash]] 6th Jul 2021 ] ] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('img/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> ??? - Hi everyone, - my name is Sherry Zhang, a PhD student from Monash University in Australia - Today I will be talking about ... - This is a work supervised by --- # Motivation > The work also reveals inadequacies in the tour optimization algorithm, that may benefit from newly developed techniques and software tools. (Laa & Cook, 2020) .left-col[ - For simulated data, the optimisers - often failed at finding the expected maxima, or - would only get close but not reach the maxima - For noisy index functions it failed completely ] .right-col[ <img src="figures/motivation-1.png" width="504" /> ] ??? - project, motivated by ursula and Di's work, that apply guided tour to physics problems. - this is what they write in the paper - they find that ... - on the right is an example where - the optimiser could have finished at a better position but it doesn't --- # How to solve this problem? To understand where the optimisers were failing, ideally we need to visualise the **space** and the **paths** that the optimisers take through the space. -- - the space is the set of all d-dimensional projections of p-dimensional space. ??? intro: - This is an interesting visualisation problem because --- # What is projection pursuit? .pull-left[ Data: `\(\mathbf{X}_{n \times p}\)` Basis: `\(\mathbf{A}_{p\times d}\)` Projection: `\(\mathbf{Y} = \mathbf{X} \cdot \mathbf{A}\)` Index function: `\(f: \mathbb{R}^{n \times d} \mapsto \mathbb{R}\)` <br> **holes index:** `\(\propto 1 -\frac{1}{n} \sum_{i = 1}^n \exp(-\frac{1}{2} y_i y_i^{\prime})\)` <br> density `\(\uparrow\)`, index value `\(\uparrow\)` <br> **optimisation: ** `$$\arg \max_{\mathbf{A} \in \mathcal{A}} f(\mathbf{X} \cdot \mathbf{A}) ~~~ s.t. ~~~ \mathbf{A}^{\prime} \mathbf{A} = I_d$$` ] .pull-right[ <img src="figures/calc-index-val-1.png" width="540" /> ] ??? Now I will introduce some background information on - what is a guided tour - what is the optimisation problem and - what are the optimisers that we have To understand guided tour, we first need to talk about projection pursuit - we denote the data as a matrix of dimension .... - Projection basis A, of dimension ..., characterises the direction from which the data get projected; - projection - Index function: maps the projection from an ... dimension space to a scalar - Throughout this presentation until the case study in the end, we will be using an index function called holes index - will not show the full formula here, but it is proportional to 1 - something that looks like a standardised normal density - on the right I have calculated the index value of the holes index for different 1D projections and we can observe that - for those projections that have high density regions, for examples, ..., it has an higher index value - The optimisation we have is to: max the index function, subject to orthonormality constraint --- # Three random search optimisers - Creeping random search (CRS): - randomly samples a basis and evaluates its index value - accepted if it has a higher index value, discarded if lower - Simulated annealing (SA): - a second chance to be accepted with probability `$$P= \min\left\{\exp\left[-\frac{\mid I_{\text{cur}} - I_{l} \mid}{T(l)}\right],1\right\}$$` where `\(I_{(\cdot)}\)` is the index value, `\(T(l) = \frac{T_0}{\log(l + 1)}\)`, `\(l\)` is the # of eval (in the current iter). - As `\(l\)` increases, `\(T(l)\)` decreases, and the probability for accepting an inferior basis decreases ??? To solve this optimisation problem, we have three optimisers. All of which are random search algorithms creeping random search: - that satisfies the orthonormality constraint - the basis, accept simulated annealing: - follows the same as CRS in sampling and accepting better bases, but - it has a different design for bases that have lower index value. These bases have - that is to say, it is less likely to accept a worse basis after more bases have been evaluated --- # Three random search optimisers (Cont.) - Pseudo derivative (PD): - randomly sample 5 directions close to the current and pick the most promising direction - search along a straight line (on the sphere) for the best candidate - same acceptance rule as CRS ??? Pseudo derivative: - flavour, gradient a'scent, as it first find a promising direction and then compute a step size - we don't use derivative here, hard to operate on a matrix. Instead, - PD then uses the same acceptance rule as CRS --- # Projection pursuit with guided tour <img src="./img/tour-path.png" width="521" style="display: block; margin: auto;" /> * projection pursuit maximises the index function to iteratively find better basis/ projection (**blue frames**) * guided tour chains these projections together through interpolation (**white frames**) and produces an smooth animation (**see next slide
<i class="fas fa-long-arrow-alt-right faa-horizontal animated " style=" color:D93F00;"></i>
**) ??? This illustration shows how projection pursuit and guided tour works together Those are shown in the blue frames --- # An example .pull-left[ ![](./img/tour-demo.gif)<!-- --> ] .pull-right[ 1D projection: histogram <br> 5 simulated variables - `\(x_2\)` is mixture normal `\(0.5 \mathcal{N}(-3, 1) + 0.5 \mathcal{N}(3, 1)\)` - others are random normal <br> We expect: - `\(x_2\)` has weight close to 1 - others are close to 0 ] ??? Here is how the animation looks like: - We use histogram to display 1D projection - The data include 5 variables, X2 is a mixture normal and others are random normal - We expect: X2 to have a weight close to 1 and others close to 0 <br> </br> - In this simple example, the optimiser works well to find the optimum - but this is not the case for all the problems, as we see in the literature - so we need some visual tools to help us diagnose where things go wrong - A side note here: this will also be the data that we used in later examples and we are always aiming to find `\(x_2\)` in this dataset --- # Ferrn package .left-col[ **Four functions to explore the trace and space of the optimisers: ** - `explore_trace_search()` - `explore_trace_interp()` - `explore_space_pca()` - `explore_space_tour()` **Botanical theme:** - `scale_color_continuous_botanical()` - `scale_color_discrete_botanical()` ] .right-col[ **To produce plots: ** ```r <data> %>% <explore_*>() %>% <scale_*>() ``` ] <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .institute-size[ `*` The syntax to adjust color in `explore_space_tour()` is slightly different ] ??? - This work leads to an R package, ferrn, where we create 4 diagnostic plots. Two for exploring the trace and two for space - There is also a botanical theme for scale the color - To produce a plot, you need to pipe the data collected from the optimisation into one of the explore functions and make color adjustment using the botanical palette - Due to the time limit of this talk, I will go through the two space plots here and later you will have the chance to see the trace plot --- # `explore_space_pca()` .pull-left[ <img src="figures/pca-plot-1.png" width="504" /> ] ??? This is the first space plot First I'm going to tell you how this plot is made and then what we can learn from it - We have all the bases evaluated in the optimisation - these are the coloured dots - we also have random bases sampled from the 5D space, these are used to draw the space - we then perform PCA on these two sets of bases and take the first two PCs to create this plot In this plot - The star is the theoretical best basis, corresponds to 1 in `\(x_2\)` and 0 in others - we have two paths, CRS in brown and PD in green - CRS evaluates random points in the space so you can see dots everywhere - PD evaluates 5 directions locally, so you can see dots scattered around an accepted basis - The paths here are the interpolation between bases accepted by projection pursuit This plot tells us that ... and it helps us to ... we can also make an animated version of this plot - here we can see clearer where each optimiser starts - an additional info, learn -- .pull-right[ - Both optimisers find the optimum but PD gets closer - Visually understand how the optimisers work ] -- .pull-right[ <img src="./img/pca.gif" width="70%" style="display: block; margin: auto;" /> ] -- .pull-right[ CRS finds the optimum faster! ] --- # `explore_space_tour()` The same two paths but on the 5D full space. .pull-left[ ![](./img/tour.gif)<!-- --> ] ??? What I'm showing you now is the same two paths but on the full space. Let's watch this animation together: - The full space is a 5D unit sphere since the orthonormality constraint requires the square of all the entries adds up to one - Here is a frame in the animation that is most similar to the previous PCA view What we learn here is that - PCA, maximises variance, most spread, full space plot allows you to see the same path from different angles. -- .pull-right[ <img src="figures/1d-sphere-frame-1.png" width="110%" height="120%" /> ] --- # 2D basis space The basis space is no longer a sphere, but a torus! .pull-left[ <img src="./img/torus.gif" width="100%" /> ] .pull-right[ <img src="figures/2D-torus-frames-1.png" width="100%" /> Basis `\(\mathbf{A}_{p\times d}\)` where `\(d = 2\)` ] ??? - similarly we can generate random 2D bases and animate it with 2D path embeded - Here I have generated the 2D basis space and captured some frames from the animation - This time, --- # Case study: optimising a noisy index .pull-left[ ### Which optimiser(s) work and which didn't? New index: `$$I(n) = \max \left[F_{Y}(n) - F_{\mathcal{N}}(n)\right]$$` <small> where `\(F_{.}(n)\)` is the ECDF function </small> <img src="figures/noisy-index-compare-1.png" width="504" /> ] ??? Now I'm presenting a case study on how visual diagnostics informs us about optmising a noisy index. - The noisy index we use is based on the kolmogorove test, defined here. - The plot on the left shows how the index value changes when interpolating between two random bases - What I do is to generate two random bases in the 5D space and ask guided tour to interpolate 100 frames in between. The index value of all the frames are calculated and arranged by its natural order to make this plot. - what we see here is that, comparing to the holes index, the new index 1) has a different range, and 2) also it is non-smooth - These add challenges to the optimisation - On the right are the trace and space plot of the three optimisers I introduced earlier on the 5 variable dataset - In the trace plot, the x-axis shows the natural order of time, y axis is the index value. The dashed horizontal line on the top is the index value of the theoretical best basis. - we see here that the first optimiser, PD, fails to optimise the noisy index while CRS and SA are getting close to this theoretical best - From the space plot, we can also see the success of CRS and SA but they are taking different approaches to find the final basis. - CRS extensively evaluates points in the space before accepting a new basis to interpolate - thus we see lots of dots in the space but the interpolation path is short, while - SA widely accepts new basis and performs interpolation and only starts to reject towards the end - thus we see a long interpolation path but few points being evaluated -- .pull-right[ <img src="figures/noisy-performance-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Summary - We have developed specialised visualisation tools to diagnose optimisation in projection pursuit guided tour - This allows us to understand algorithms that are not obvious to the end users <br> **For tourr developers:** - Provides tools to test new optimisers and optimisation of new indexes **For algorithm developers:** - Can be seen as a work to enhance the interpretability of black-box optimsimation algorithms --- class: hide-slide-number split-70 count: false .column.content[ <br> <p style = "font-size: 60px"> Questions? </p> Package:
<i class="fab fa-github faa-none animated "></i>
[huizezhang-sherry/ferrn](https://github.com/huizezhang-sherry/ferrn) Slides:
<i class="fab fa-github faa-none animated "></i>
[sherryzhang-user2021.netlify.app](https://sherryzhang-user2021.netlify.app) .bottom_abs.width100[ ## H. Sherry Zhang
<i class="fab fa-twitter faa-none animated "></i>
[huizezhangsh](https://twitter.com/huizezhangsh) <i class="fa fa-link"></i> https://huizezhangsh.netlify.com/ Supervised by Dianne Cook, Ursula Laa, Nicolas Langrené, and Patricia Menéndez ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('img/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div>