{"id":57,"date":"2018-07-04T02:34:56","date_gmt":"2018-07-04T02:34:56","guid":{"rendered":"http:\/\/wcwp.rf.gd\/2018\/07\/04\/https-landscapearchaeology-org-2018-numpy-loops\/"},"modified":"2018-07-04T02:34:56","modified_gmt":"2018-07-04T02:34:56","slug":"numpy-loops","status":"publish","type":"post","link":"https:\/\/landscapeanalysis.org\/?p=57","title":{"rendered":"Looping through numpy arrays (e.g. moving\/rolling window)"},"content":{"rendered":"<p>Numpy is the cornerstone of matrix based calculations in QGIS (and elsewhere). It does wonders with raster data (unless it hits the limit of available live memory\u2026).<\/p>\n<p>A recurrent problem with Numpy is the implementation of various looping routines, such as the sliding window which is frequently used in image filtering and other approaches focused on cell neighbourhood. Below is the illustration of the problem: for each cell the window needs to query a specified neighbourhood (square, circular or other).<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/landscapearchaeology.org\/wp-content\/uploads\/figures-jekyll\/2018\/06\/2017-11-numpy1b.png\" alt=\"\" \/><\/p>\n<p>Now, there exists a solution in the form of \u201cstride tricks\u201d within the Numpy library. However it is really cryptic (the <a href=\"http:\/\/scipy-cookbook.readthedocs.io\/items\/GameOfLifeStrides.html\">SciPy cookbook<\/a> uses the \u201cdevious\u201d epithet); after all the very name \u201ctricks\u201d implies it being \u2026 tricky. Numpy documentation actually <a href=\"https:\/\/docs.scipy.org\/doc\/numpy-1.13.0\/reference\/generated\/numpy.lib.stride_tricks.as_strided.html\">discourages its use<\/a>. We can do it in a more understandable manner &#8211; which also means more configurable &#8211; and without any particular computational overhead.<\/p>\n<p>Let\u2019s first examine the problem. Without Numpy we would need four nested loops: two for traversing the matrix and two for the analysed window.<\/p>\n<div class=\"language-python highlighter-rouge\">\n<div class=\"highlight\">\n<pre class=\"highlight\"><code><span class=\"c1\">#Level 1: traversing the matrix \n<\/span><span class=\"k\">for<\/span> <span class=\"n\">y<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">rows<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">columns<\/span><span class=\"p\">):<\/span>\n\n        <span class=\"c1\">#Level 2: traversing the window (3x3 size) \n<\/span>        <span class=\"k\">for<\/span> <span class=\"n\">win_y<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">y<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span><span class=\"n\">y<\/span><span class=\"o\">+<\/span><span class=\"mi\">2<\/span><span class=\"p\">):<\/span>\n            <span class=\"k\">for<\/span>  <span class=\"n\">win_x<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"o\">+<\/span><span class=\"mi\">2<\/span><span class=\"p\">):<\/span>\n                <span class=\"n\">temp_sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">matrix<\/span><span class=\"p\">[<\/span><span class=\"n\">win_y<\/span><span class=\"p\">,<\/span> <span class=\"n\">win_x<\/span><span class=\"p\">]<\/span>\n\n       <span class=\"c1\">#Analysis inside the window is finished\n<\/span>       <span class=\"c1\">#Write result to a copy of the input matrix\n<\/span>       <span class=\"n\">output<\/span><span class=\"p\">[<\/span><span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">calculate_something<\/span><span class=\"p\">(<\/span><span class=\"n\">temp_sum<\/span><span class=\"p\">)<\/span> \n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We can immediately see that the level 2 loop can be easily replaced with a Numpy window: <code class=\"language-python highlighter-rouge\"><span class=\"n\">win<\/span> <span class=\"o\">=<\/span> <span class=\"n\">matrix<\/span><span class=\"p\">[<\/span><span class=\"n\">y<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">:<\/span> <span class=\"n\">y<\/span><span class=\"o\">+<\/span><span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">:<\/span> <span class=\"n\">x<\/span><span class=\"o\">+<\/span><span class=\"mi\">2<\/span><span class=\"p\">]<\/span><\/code>. Easy enough \u2026 but this is not the ideal approach. The window has 9 cells, while the matrix would have millions: shouldn\u2019t we focus on the level 1 loop instead?<\/p>\n<p>To avoid the level 1 loop we need to move the entire matrix, as when overlapping two sheets of paper. We\u2019re now working on offsets between the two &#8211; which is the only tricky part. The (0,0) cell of the window now corresponds to all cells in the matrix that are on the upper left relative to analysed cells &#8211; which simply means shifting the entire matrix to the upper left. In order to move the matrix in that direction, we need to calculate offsets in relation to the central point. That would be <code class=\"language-python highlighter-rouge\"><span class=\"n\">window_index<\/span> <span class=\"o\">-<\/span> <span class=\"n\">window_radius<\/span><\/code> , which for the upper left cell gives (-1, -1) (see below).<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/landscapearchaeology.org\/wp-content\/uploads\/figures-jekyll\/2018\/06\/2017-11-numpy2.png\" alt=\"\" \/><\/p>\n<p>The only problem resides in handling these offsets on the level of the entire matrix. We need to extract matrix views that do not spill over the borders AND that match each other in size and shape. The overlap issue is represented on the illustration below.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/landscapearchaeology.org\/wp-content\/uploads\/figures-jekyll\/2018\/06\/2017-11-numpy3.png\" alt=\"\" \/><\/p>\n<p>This problem can be avoided with padding: adding extra row(s) and column(s) around the edges. The depth of the padded area should correspond to the maximum possible offset. The solution is now as follows:<\/p>\n<div class=\"language-python highlighter-rouge\">\n<div class=\"highlight\">\n<pre class=\"highlight\"><code><span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"k\">as<\/span> <span class=\"n\">np<\/span>\n\n<span class=\"c1\">#a mock dataset\n<\/span><span class=\"n\">data<\/span><span class=\"o\">=<\/span><span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">random<\/span><span class=\"p\">.<\/span><span class=\"n\">rand<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">,<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n\n<span class=\"n\">rows<\/span><span class=\"p\">,<\/span> <span class=\"n\">columns<\/span> <span class=\"o\">=<\/span> <span class=\"n\">data<\/span><span class=\"p\">.<\/span><span class=\"n\">shape<\/span>\n<span class=\"n\">temp_sum<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">zeros<\/span><span class=\"p\">((<\/span><span class=\"n\">rows<\/span><span class=\"p\">,<\/span> <span class=\"n\">columns<\/span><span class=\"p\">))<\/span>\n\n<span class=\"c1\"># create a padded copy\n<\/span><span class=\"n\">pad<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span> \n<span class=\"n\">matrix<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">pad<\/span><span class=\"p\">(<\/span><span class=\"n\">data<\/span><span class=\"p\">,<\/span> <span class=\"n\">pad<\/span><span class=\"p\">,<\/span> <span class=\"s\">'edge'<\/span><span class=\"p\">)<\/span>\n\n<span class=\"c1\"># Level 2: traversing the window (3x3 size)\n# You could (should) use numpy.ndenumerate() function, as well\n<\/span><span class=\"k\">for<\/span> <span class=\"n\">y<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"mi\">3<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"mi\">3<\/span><span class=\"p\">):<\/span>\n\n        <span class=\"c1\"># Level 1: handling the matrix \n<\/span>        <span class=\"c1\"># (rows, columns = data.shape !)\n<\/span>        <span class=\"n\">temp_sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">matrix<\/span><span class=\"p\">[<\/span><span class=\"n\">y<\/span> <span class=\"p\">:<\/span> <span class=\"n\">rows<\/span> <span class=\"o\">+<\/span> <span class=\"n\">y<\/span><span class=\"p\">,<\/span>\n                          <span class=\"n\">x<\/span> <span class=\"p\">:<\/span> <span class=\"n\">columns<\/span> <span class=\"o\">+<\/span> <span class=\"n\">x<\/span><span class=\"p\">]<\/span>\n\n<span class=\"c1\">#Analysis inside the window is finished\n<\/span><span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">calculate_something<\/span><span class=\"p\">(<\/span><span class=\"n\">temp_sum<\/span><span class=\"p\">)<\/span> \n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>Note that the pad width should be half the window size: <code class=\"language-python highlighter-rouge\"><span class=\"n\">pad<\/span> <span class=\"o\">=<\/span> <span class=\"n\">window_size<\/span> <span class=\"o\">\/\/<\/span> <span class=\"mi\">2<\/span><\/code> where \/\/ is integer division. Window frame size should always be in odd numbers, otherwise it won\u2019t be placed symetrically.<\/p>\n<h3 id=\"refinement-unpadded-solution\">Refinement: unpadded solution<\/h3>\n<p>The above solution should be good enough for most of typical uses (for example, a small moving average filter). But, it\u2019s kind of ugly to use pads. We are producing copies of the original data, which can be particularly problematic for large offsets. Furthermore, padding may influence the result in edge zones in unexpected ways.<\/p>\n<p>Returning to the illustration above, we can see that the cropped window view cannot match the main view (the one pointing to the original and\/or temporary data). The latter also needs to be cropped, but in the opposite direction. This is logical: the top row cannot have a neighbour to the north anyway.<\/p>\n<p>Therefore, we need to adjust both views at the same time. Note that these views are opposite to each other, which means that we can play with swapping: when the window is placed to the north-west (-1, -1), the main view is to the south-east (1,1), but when the window is on the south-east (1,1), the main view will be on the north-west (-1, -1) (see the illustration above).<\/p>\n<p>To finish, let\u2019s throw in an additional parameter for steps &#8211; the range to be skipped between analysed cells (see <a href=\"https:\/\/docs.scipy.org\/doc\/numpy-1.13.0\/reference\/arrays.indexing.html\">numpy docs<\/a> for steps). Let\u2019s now wrap all this to a function which will return matching views:<\/p>\n<div class=\"language-python highlighter-rouge\">\n<div class=\"highlight\">\n<pre class=\"highlight\"><code><span class=\"s\">&quot;&quot;&quot;\nFunction returning two matching numpy views for moving window routines.\n- offset_y and offset_x refer to the shift in relation to the analysed (central) cell \n- size_y and size_x refer to the size of the data matrix (not of the window!)\n- view_in is the shifted view and view_out is the position of central cells\n\nAll comments are welcome at landscapeanalysis.org\/2018\/numpy-loops\n&quot;&quot;&quot;<\/span>\n<span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"k\">as<\/span> <span class=\"n\">np<\/span>\n\n<span class=\"k\">def<\/span> <span class=\"nf\">view<\/span> <span class=\"p\">(<\/span><span class=\"n\">offset_y<\/span><span class=\"p\">,<\/span> <span class=\"n\">offset_x<\/span><span class=\"p\">,<\/span> <span class=\"n\">size_y<\/span><span class=\"p\">,<\/span> <span class=\"n\">size_x<\/span><span class=\"p\">,<\/span> <span class=\"n\">step<\/span><span class=\"o\">=<\/span><span class=\"mi\">1<\/span><span class=\"p\">):<\/span>\n \n    <span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">abs<\/span><span class=\"p\">(<\/span><span class=\"n\">offset_x<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">y<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">abs<\/span><span class=\"p\">(<\/span><span class=\"n\">offset_y<\/span><span class=\"p\">)<\/span>\n \n    <span class=\"n\">x_in<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">slice<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"p\">,<\/span> <span class=\"n\">size_x<\/span><span class=\"p\">,<\/span> <span class=\"n\">step<\/span><span class=\"p\">)<\/span> \n    <span class=\"n\">x_out<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">slice<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">size_x<\/span> <span class=\"o\">-<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">step<\/span><span class=\"p\">)<\/span>\n \n    <span class=\"n\">y_in<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">slice<\/span><span class=\"p\">(<\/span><span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"n\">size_y<\/span><span class=\"p\">,<\/span> <span class=\"n\">step<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">y_out<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">slice<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">size_y<\/span> <span class=\"o\">-<\/span> <span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"n\">step<\/span><span class=\"p\">)<\/span>\n \n    <span class=\"c1\"># the swapping trick    \n<\/span>    <span class=\"k\">if<\/span> <span class=\"n\">offset_x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span> <span class=\"n\">x_in<\/span><span class=\"p\">,<\/span> <span class=\"n\">x_out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">x_out<\/span><span class=\"p\">,<\/span> <span class=\"n\">x_in<\/span>                                 \n    <span class=\"k\">if<\/span> <span class=\"n\">offset_y<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span> <span class=\"n\">y_in<\/span><span class=\"p\">,<\/span> <span class=\"n\">y_out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">y_out<\/span><span class=\"p\">,<\/span> <span class=\"n\">y_in<\/span>\n \n    <span class=\"c1\"># return window view (in) and main view (out)\n<\/span>    <span class=\"k\">return<\/span> <span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">s_<\/span><span class=\"p\">[<\/span><span class=\"n\">y_in<\/span><span class=\"p\">,<\/span> <span class=\"n\">x_in<\/span><span class=\"p\">],<\/span> <span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">s_<\/span><span class=\"p\">[<\/span><span class=\"n\">y_out<\/span><span class=\"p\">,<\/span> <span class=\"n\">x_out<\/span><span class=\"p\">]<\/span>\n \n<span class=\"c1\">#a mock dataset\n<\/span><span class=\"n\">data<\/span><span class=\"o\">=<\/span><span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">random<\/span><span class=\"p\">.<\/span><span class=\"n\">rand<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">,<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">window<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">3<\/span>\n\n<span class=\"n\">radius<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">int<\/span><span class=\"p\">(<\/span><span class=\"n\">window<\/span><span class=\"o\">\/<\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span>\n \n<span class=\"n\">rows<\/span><span class=\"p\">,<\/span> <span class=\"n\">columns<\/span> <span class=\"o\">=<\/span> <span class=\"n\">data<\/span><span class=\"p\">.<\/span><span class=\"n\">shape<\/span>\n<span class=\"n\">temp_sum<\/span> <span class=\"o\">=<\/span> <span class=\"n\">np<\/span><span class=\"p\">.<\/span><span class=\"n\">zeros<\/span><span class=\"p\">((<\/span><span class=\"n\">rows<\/span><span class=\"p\">,<\/span> <span class=\"n\">columns<\/span><span class=\"p\">))<\/span>\n \n<span class=\"c1\"># Our window loop  \n<\/span><span class=\"k\">for<\/span> <span class=\"n\">y<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">window<\/span><span class=\"p\">):<\/span>\n\n    <span class=\"c1\">#we need offsets from centre !\n<\/span>    <span class=\"n\">y_off<\/span> <span class=\"o\">=<\/span> <span class=\"n\">y<\/span> <span class=\"o\">-<\/span> <span class=\"n\">radius<\/span>\n\n    <span class=\"k\">for<\/span> <span class=\"n\">x<\/span> <span class=\"ow\">in<\/span> <span class=\"nb\">range<\/span><span class=\"p\">(<\/span><span class=\"n\">window<\/span><span class=\"p\">):<\/span>\n        <span class=\"n\">x_off<\/span> <span class=\"o\">=<\/span> <span class=\"n\">x<\/span> <span class=\"o\">-<\/span> <span class=\"n\">radius<\/span>\n \n        <span class=\"n\">view_in<\/span><span class=\"p\">,<\/span> <span class=\"n\">view_out<\/span> <span class=\"o\">=<\/span> <span class=\"n\">view<\/span><span class=\"p\">(<\/span><span class=\"n\">y_off<\/span><span class=\"p\">,<\/span> <span class=\"n\">x_off<\/span><span class=\"p\">,<\/span> <span class=\"n\">rows<\/span><span class=\"p\">,<\/span> <span class=\"n\">columns<\/span><span class=\"p\">)<\/span>\n \n        <span class=\"n\">temp_sum<\/span><span class=\"p\">[<\/span><span class=\"n\">view_out<\/span><span class=\"p\">]<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">data<\/span><span class=\"p\">[<\/span><span class=\"n\">view_in<\/span><span class=\"p\">]<\/span>\n\n<span class=\"c1\">#Analysis inside the window is finished\n<\/span><span class=\"n\">output<\/span> <span class=\"o\">=<\/span> <span class=\"n\">calculate_something<\/span><span class=\"p\">(<\/span><span class=\"n\">temp_sum<\/span><span class=\"p\">)<\/span> \n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>Perhaps we could do better, but the advantage of this approach is that we can adapt it to any imaginable shape of the moving window &#8211; because we conserve the loop over window cells. We are also working on views, as opposed to hard copies of the data, which is what Numpy likes a lot.<\/p>\n<h2 id=\"related-posts\">Related posts<\/h2>\n<p>You can find an implementation of sliding window logic in a script for GIS raster analysis <a href=\"https:\/\/landscapeanalysis.org\/2021\/python-tpi\/\">(Topographic position index)<\/a>.<\/p>\n<p>Any suggestions are welcome!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Numpy is the cornerstone of matrix based calculations in QGIS (and elsewhere). It does wonders with raster data (unless it hits the limit of available live memory\u2026). A recurrent problem with Numpy is the implementation of various looping routines, such as the sliding window which is frequently used in image filtering and other approaches focused [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[8],"class_list":["post-57","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-code"],"_links":{"self":[{"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=\/wp\/v2\/posts\/57","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=57"}],"version-history":[{"count":0,"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=\/wp\/v2\/posts\/57\/revisions"}],"wp:attachment":[{"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=57"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=57"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/landscapeanalysis.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=57"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}