Update documentation - kfr - Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)

commit db03a8d454149ea73c07091ec19fa1bdb8aed438
parent be5ad6e281bc25f8290af92fa009f3a1c255cd8c
Author: d.levin256@gmail.com <d.levin256@gmail.com>
Date:   Tue, 29 Nov 2022 21:26:01 +0000

Update documentation

Diffstat:
A KNOWNBUGS.md  | 9 +++++++++
M README.md  | 395 ++++++++++++++-----------------------------------------------------------------
M azure-pipelines.yml  | 2 +-
M docs/README.md  | 40 +---------------------------------------
M docs/cxxdox.yml  | 9 ++++++---
A docs/docs/basics.md  | 334 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A docs/docs/capi.md  | 27 +++++++++++++++++++++++++++
M docs/docs/conv_reverb.md  | 4 ++--
M docs/docs/convert_stereo.md  | 5 ++---
M docs/docs/dft.md  | 2 +-
M docs/docs/expressions.md  | 40 +++++++++++++++++++++-------------------
M docs/docs/index.md  | 103 ++++++++++++++++++++++++++++++++-----------------------------------------------
A docs/docs/installation.md  | 257 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M docs/docs/normalize.md  | 5 ++---
D docs/docs/types.md  | 82 -------------------------------------------------------------------------------
M docs/mkdocs.yml  | 39 +++++++++++++++++++++++++++++----------
M include/kfr/base/fraction.hpp  | 5 ++---
M include/kfr/base/handle.hpp  | 2 +-
M include/kfr/base/math_expressions.hpp  | 2 +-
M include/kfr/base/memory.hpp  | 2 +-
M include/kfr/base/reduce.hpp  | 2 +-
M include/kfr/base/shape.hpp  | 2 +-
M include/kfr/base/state_holder.hpp  | 2 +-
M include/kfr/base/tensor.hpp  | 27 ++++++++++++++++++++++++++-
M include/kfr/base/univector.hpp  | 2 +-
M include/kfr/cometa/memory.hpp  | 2 +-
M include/kfr/cometa/result.hpp  | 2 +-
M include/kfr/dsp/waveshaper.hpp  | 1 +
M include/kfr/dsp/weighting.hpp  | 1 +
M include/kfr/simd/impl/backend_clang.hpp  | 12 ++++++------
M include/kfr/simd/impl/backend_generic.hpp  | 3 +++
M include/kfr/simd/impl/basicoperators_complex.hpp  | 1 +
M include/kfr/simd/sort.hpp  | 2 +-
A requirements.txt  | 3 +++

34 files changed, 855 insertions(+), 571 deletions(-)
diff --git a/KNOWNBUGS.md b/KNOWNBUGS.md
@@ -0,0 +1,9 @@
+# Known bugs and limitations
+
+| Compiler | Architecture | ISA | Description
+| - | - | - | - |
+| Visual Studio | x86/x86_64 | < SSE2 | Not supported, SSE2 is required |
+| GCC | x86/x86_64 | < SSE2 | Not supported, SSE2 is required |
+| Visual Studio 2022 | x86_64 | AVX512 | Internal Compiler Error (sometimes) |
+| Clang 14 | x86/x86_64 | Generic | Code generation bug in Clang |
+| GCC 12 | x86/x86_64 | AVX512 | Code generation bug in GCC |
diff --git a/README.md b/README.md
@@ -4,102 +4,96 @@
   <img width="300" height="auto" src="img/KFR1.png">
 </p>
 
-
 ![Build Status](https://img.shields.io/azure-devops/build/dlevin256/dlevin256/1/master.svg?style=flat-square)
-[![Gitter](https://img.shields.io/gitter/room/kfrlib/kfr.svg?maxAge=2592000&style=flat-square)](https://gitter.im/kfrlib/kfr) ![License](https://img.shields.io/github/license/kfrlib/kfr.svg?style=flat-square)
+![License](https://img.shields.io/github/license/kfrlib/kfr.svg?style=flat-square)
+
+https://www.kfr.dev
+
+KFR is an open source C++ DSP framework that contains high performance building blocks for DSP, audio, scientific and other applications. It is distributed under dual GPLv2/v3 and [commercial license](https://kfr.dev/purchase).
+
+## [Installation](docs/docs/installation.md)
 
 Compiler support:
 
-![Clang 6+](https://img.shields.io/badge/Clang-6%2B-brightgreen.svg?style=flat-square)
-![Xcode 9+](https://img.shields.io/badge/Xcode-9%2B-brightgreen.svg?style=flat-square)
+![Clang 9+](https://img.shields.io/badge/Clang-9%2B-brightgreen.svg?style=flat-square)
+![Xcode 10.3+](https://img.shields.io/badge/Xcode-10%2B-brightgreen.svg?style=flat-square)
 ![GCC 7+](https://img.shields.io/badge/GCC-7%2B-brightgreen.svg?style=flat-square)
-![MSVC 2017](https://img.shields.io/badge/MSVC-2017%2B-brightgreen.svg?style=flat-square)
+![MSVC 2019](https://img.shields.io/badge/MSVC-2019%2B-brightgreen.svg?style=flat-square)
 
-https://www.kfr.dev
+KFR has no external dependencies except a C++17 compatible standard C++ library.
+CMake is used as build system.
+
+Clang is highly recommended and proven to provide the best performance for KFR. You can use clang as a drop-in replacement of both GCC on Linux and MSVC on Windows. On macOS clang is the default compiler and included in the official Xcode toolchain.
 
-KFR is an open source C++ DSP framework that focuses on high performance (see benchmark results section).
+_Note_: Building DFT module requires Clang at this moment due to internal compiler errors and lack of optimization in GCC and MSVC.
 
-KFR has no external dependencies except C++17-compatible standard C++ library.
+:arrow_right: See [Installation](docs/docs/installation.md)   for more details
 
-Some C++17 library features will be emulated if not present in the standard library.
+## Features
 
-# Features
+### FFT/DFT
+* Optimized DFT implementation for any size (non-power of two sizes are supported)
+* DFT performance is on par with the most performant implementation currently available [See Benchmarks](#benchmark-results)
+* Real Forward and Inverse DFT
+* Discrete Cosine Transform type II (and its inverse, also called DCT type III)
+* Convolution using FFT
+* Convolution filter
+
+:arrow_right: See also [How to apply FFT](docs/docs/dft.md) with KFR
 
-## What's new in KFR 4.0
+### DSP
 
 * IIR filter design
   * Butterworth
   * Chebyshev type I and II
   * Bessel
   * Lowpass, highpass, bandpass and bandstop filters
-  * Conversion of arbitrary filter from Z,P,K to SOS format (suitable for biquad function and filter)
-* Discrete Cosine Transform type II (and its inverse, also called DCT type III)
-* cmake uninstall target (thank to [@acxz](https://github.com/acxz))
-* C API: DFT, real DFT, DCT, FIR and IIR filters and convolution, memory allocation
-  * Built for SSE2, SSE4.1, AVX, AVX2, AVX512, x86 and x86_64, architecture is selected at runtime
-  * Can be used with any compiler and any language with ability to call C functions
-  * Windows binaries will be available soon
-* C++17
-  * Inline variables
-  * Fold expressions
-  * Structured binding
-* New vector based types: color, rectangle, point, size, border, geometric vector, 2D matrix
-* Color space conversion (sRGB, XYZ, Lab, LCH)
-* MP3 file reading (using third party dr_lib library, see source code for details)
-* Various optimizations and fixes (thank to [@bmanga](https://github.com/bmanga), [@ncorgan](https://github.com/ncorgan), [@rotkreis](https://github.com/rotkreis), [@mujjingun](https://github.com/mujjingun) for fixes and bug reports)
-
-### Release notes
-
-* DFT is limited to Clang due to ICE in MSVC and broken AVX optimization in GCC 8 and 9. Once fixed, support will be added
-
-## What's new in KFR 3.0
-
-* Optimized non-power of two DFT implementation
-* GCC 7+ support
-* MSVC 2017 support
-* AVX-512 support (MSVC and Clang, GCC has incomplete support of AVX-512 instrinsics)
-* EBU R128
-* Ability to include KFR as a subdirectory in cmake project
-* Ability to link objects built for multiple architectures into one binary
-* Number of automatic tests has been increased
-* C API for DFT
-* GPL version changed from 3 to 2+
-
-## All features
-
-* All code in the library is optimized for Intel, AMD (SSE2, SSE3, SSE4.x, AVX and AVX2 and AVX512) and ARM (NEON) processors
-* Mathematical and statistical functions
+  * Conversion of arbitrary filter from {Z, P, K} to SOS format (suitable for biquad function and filter)
+* Biquad filter [See Benchmarks](#benchmark-results)
+* Simple biquad filter design
+* FIR filter design using window method
+* Loudness measurement according to EBU R128
+* Window functions: Triangular, Bartlett, Cosine, Hann, Bartlett-Hann, Hamming, Bohman, Blackman, Blackman-Harris, Kaiser, Flattop, Gaussian, Lanczos, Rectangular
+* Sample rate conversion with configurable quality and linear phase
+* Oscillators, fast incremental sine/cosine generation,  Goertzel algorithm, fractional delay
+
+
+### Base
+
+* Tensors (multidimensional arrays)
+* Statistical functions
+* Random number generation
 * Template expressions (See examples)
+* Ring (Circular) buffer
+
+### Math
+
+* Mathematical functions such as `sin`, `log` and `cosh` built on top of SIMD primitives
+* Most of the standard library functions are re-implemented to support vector of any length and data type
+
+### SIMD
+
+* `vec<T, N>` class and related functions that abstracts cpu-specific intrinsics
+* All code in the library is optimized for Intel, AMD (SSE2, SSE3, SSE4.x, AVX and AVX2 and AVX512) and ARM, AArch64 (NEON) processors
 * All data types are supported including complex numbers
 * All vector lengths are also supported. `vec<float,1>`, `vec<unsigned,3>`, `vec<complex<float>, 11>` all are valid vector types in KFR
-* Most of the standard library functions are re-implemented to support vector of any length and data type
-* Runtime cpu detection
-
-### Included DSP/audio algorithms:
-
-* FFT
-* Convolution
-* FIR filtering
-* FIR filter design using the window method
-* Resampling with configurable quality (See resampling.cpp from Examples directory)
-* Goertzel algorithm
-* Fractional delay
-* Biquad filtering
-* Biquad design functions
-* Oscillators: Sine, Square, Sawtooth, Triangle
-* Window functions: Triangular, Bartlett, Cosine, Hann, Bartlett-Hann, Hamming, Bohman, Blackman, Blackman-Harris, Kaiser, Flattop, Gaussian, Lanczos, Rectangular
-* Audio file reading/writing
-* Pseudorandom number generator
 * Sorting
-* Ring (Circular) buffer
-* Simple waveshaper
-* Fast incremental sine/cosine generation
-* EBU R128
 
-# Benchmark results
-## DFT
 
-### KFR 3.0.1
+### IO
+
+* Audio file reading/writing
+* WAV
+* FLAC
+* MP3
+
+### C API
+
+C API is available and includes a subset of KFR features including FFT and filter processing.
+
+## Benchmark results
+
+### DFT
 
 Powers of 2, from 16 to 16777216 (*Higher is better*)
 
@@ -120,214 +114,12 @@ Random sizes from 120 to 30720000 (*Higher is better*)
 See [fft benchmark](https://github.com/kfrlib/fft-benchmark) for details about benchmarking process.
 
 
-## Biquad
+### Biquad
 
  (*Higher is better*)
 
 ![Biquad Performance](img/biquad.svg)
 
-# Usage
-
-## Common prerequisites
-
-* CMake 3.0 or newer for building tests and examples
-* Python 2.7 or 3.x for running examples
-* (Optional) Ninja (https://ninja-build.org/)
-
-For running examples and plotting frequency responses of filters the following python packages are required:
-
-```bash
-pip install matplotlib
-pip install numpy
-pip install scipy
-```
-Or download prebuilt python packages for windows
-
-To obtain the full code, including examples and tests, you can clone the git repository:
-
-```
-git clone https://github.com/kfrlib/kfr.git
-```
-
-## Building KFR C API
-
-### Windows
-
-These commands must be executed in MSVC2017 command prompt
-
-```bash
-cd <path_to_kfr_repository>
-mkdir build && cd build
-cmake -GNinja -DENABLE_CAPI_BUILD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="<PATH_TO_LLVM_DIR>/bin/clang-cl.exe" ..
-ninja kfr_capi
-```
-
-### Linux, macOS, other
-
-```bash
-cd <path_to_kfr_repository>
-mkdir build && cd build
-cmake -GNinja -DENABLE_CAPI_BUILD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ ..
-ninja kfr_capi
-```
-
-#### ArchLinux Package
-KFR is available on the [ArchLinux User Repository](https://wiki.archlinux.org/index.php/Arch_User_Repository) (AUR).
-You can install it with an [AUR helper](https://wiki.archlinux.org/index.php/AUR_helpers), like [`yay`](https://aur.archlinux.org/packages/yay/), as follows:
-
-```bash
-yay -S kfr
-```
-To discuss any issues related to this AUR package refer to the comments section of
-[`kfr`](https://aur.archlinux.org/packages/kfr/).
-
-Prebuilt binaries will be available soon.
-
-## Including in CMake project
-
-CMakeLists.txt contains these libraries:
-* `kfr` - header only interface library
-* `kfr_dft` - static library for DFT and related algorithms
-* `kfr_io` - static library for file IO and audio IO
-
-```cmake
-# Include KFR subdirectory
-add_subdirectory(kfr)
-
-# Add header-only KFR to your executable or library, this sets include directories etc
-target_link_libraries(your_executable_or_library kfr)
-
-# Add KFR DFT to your executable or library, (cpp file will be built for this)
-target_link_libraries(your_executable_or_library kfr_dft)
-
-# Add KFR IO to your executable or library, (cpp file will be built for this)
-target_link_libraries(your_executable_or_library kfr_io)
-```
-
-## Makefile, command line etc (Unix-like systems)
-
-```bash
-# Add this to command line
--Ipath_to_kfr/include
-
-# And this if needed
--lkfr_dft -lkfr_io
-
-# C++17 mode must be enabled
--std=c++17
-# or
--std=gnu++17
-```
-
-## Linux
-
-### Prerequisites
-* GCC 7 or newer
-* Clang 6.0 or newer
-
-### Command line
-```bash
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -DENABLE_TESTS=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ..
-make -- -j
-```
-Or using Ninja
-```bash
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ..
-ninja
-```
-
-## macOS
-
-### Prerequisites
-* XCode 9.x, 10.x or 11.x
-
-### Command line
-Using Xcode project:
-```bash
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -GXcode -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
-cmake --build .
-```
-Using Unix Makefiles:
-```bash
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -G"Unix Makefiles" -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
-make -- -j
-```
-Or using Ninja:
-```bash
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
-ninja
-```
-
-## Visual Studio
-
-### Prerequisites
-* Visual Studio 2017
-* Latest Clang (http://llvm.org/)
-* Ninja is highly recommended because Visual Studio does not support parallel build with Clang at this moment.
-
-### Visual Studio IDE
-
-To work with KFR in Visual Studio you must add the path to the `include` directory inside KFR directory to the list of the project's include directories.<br>
-More details:
-https://docs.microsoft.com/en-us/cpp/ide/vcpp-directories-property-page?view=vs-2017
-
-Make sure that LLVM toolset is set for the project<br>
-
-Download and install official LLVM extension:
-* LLVM toolchain for Visual Studio https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain
-
-More details:
-https://docs.microsoft.com/en-us/cpp/ide/general-property-page-project?view=vs-2017
-
-LLVM/Clang has very good compatibility with MSVC ABI and it's widely used for building large projects on Windows (including Chrome), so switching to LLVM/Clang should not cause compatibility problems.
-
-### Command line
-Using Ninja:
-```
-cd <path_to_kfr>
-mkdir build && cd build
-call "C:\<path to your Visual Studio installation>\VC\Auxiliary\Build\vcvars64.bat"
-cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" -DCMAKE_CXX_FLAGS=-m64 -DCMAKE_BUILD_TYPE=Release ..
-ninja
-```
-Or generate Visual Studio solution (building will be slower):
-```
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -G"Visual Studio 15 2017 Win64" -DENABLE_TESTS=ON -Tllvm -DCMAKE_BUILD_TYPE=Release ..
-```
-
-## MinGW/MSYS
-
-### Prerequisites
-* Latest MinGW or MSYS2
-* Clang 6.0 or newer
-
-Using Makefiles:
-```
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
-make -- -j
-```
-Using Ninja:
-```
-cd <path_to_kfr>
-mkdir build && cd build
-cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
-ninja
-```
-
 ## Documentation
 
 Documentation home:
@@ -358,57 +150,8 @@ cd tests
 ctest -V
 ```
 
-Tested on the following systems:
-
-
-### macOS
-* (**Intel AVX2**) macOS **10.13.6** / Xcode 10 / AppleClang 10.0.0.10001145
-* (**Intel AVX** Azure Pipelines) macOS **10.13.6** / Xcode 10.1 / AppleClang 10.0.0.10001145
-* (**Intel AVX** Azure Pipelines) macOS **10.13.6** / Xcode 10 / AppleClang 10.0.0.10001145
-* (**Intel AVX** Azure Pipelines) macOS **10.13.6** / Xcode 9.4.1 / AppleClang 9.1.0.9020039
-* (**Intel AVX** Azure Pipelines) macOS **10.13.6** / Xcode 9.0.1 / AppleClang 9.0.0.9000038
-* (**Intel AVX** Azure Pipelines) macOS **10.13.6** / Xcode 8.3.3 / AppleClang 8.1.0.8020042
-* (**Intel AVX2**) macOS **10.11.6** / Xcode 7.3 / AppleClang 7.3.0.7030031
-* (**Intel AVX2**) macOS **10.11.4** / Xcode 7.3 / AppleClang 7.3.0.7030031
-* (**ARMv7, ARMv7s, ARM64**) macOS **10.11.6** / Xcode 7.3 / AppleClang 7.3.0.7030031
-* (**Intel AVX**) macOS **10.10.5** / Xcode 7.1 / AppleClang 7.0.0.7000176
-* (**SSE4.2** Travis-CI) macOS **10.11.6** / Xcode 8 (beta4)  / AppleClang 8.0.0.8000035
-* (**SSE4.2** Travis-CI) macOS **10.11.5** / Xcode 7.3 / AppleClang 7.3.0.7030031
-* (**SSE4.2** Travis-CI) macOS **10.11.5** / Xcode 7.2 / AppleClang 7.0.2.7000181
-* (**SSE4.2** Travis-CI) macOS **10.10.5** / Xcode 7.1 / AppleClang 7.0.0.7000176
-* (**SSE4.2** Travis-CI) macOS **10.10.5** / Xcode 7 / AppleClang 7.0.0.7000072
-* (**SSE4.2** Travis-CI) macOS **10.10.5** / Xcode 6.4 / AppleClang 6.1.0.6020053
-* (**SSE4.2** Travis-CI) macOS **10.10.3** / Xcode 6.3 / AppleClang 6.1.0.6020049
-
-### Ubuntu
-* (**Intel AVX2**) Ubuntu **18.04** / gcc-7.x / clang version 7.0.0 (tags/RELEASE_700/final)
-* (**Intel AVX2**) Ubuntu **16.04** / gcc-5.4.0 / clang version 3.8.0 (tags/RELEASE_380/final)
-* (**ARMv7 NEON**) Ubuntu **16.04** / gcc-5.4.0 / clang version 3.8.0 (tags/RELEASE_380/final)
-* (**ARMv7 NEON**) Ubuntu **14.04** / gcc-4.8.4 / clang version 3.8.0 (tags/RELEASE_380/final)
-* (**ARMv7 NEON** Travis-CI) Ubuntu **14.04** / gcc-4.8.4 / clang version 3.8.0 (tags/RELEASE_380/final)
-* (**Intel AVX2** Travis-CI) Ubuntu **12.04** / gcc-5.4.0 / clang version 3.8.0 (tags/RELEASE_380/final)
-* (**Intel AVX2** Travis-CI) Ubuntu **14.04** / gcc-5.3.0 (Ubuntu 5.3.0-3ubuntu1~14.04) 5.3.0 20151204 / clang version 3.8.0 (tags/RELEASE_380/final)
-
-### Windows
-* (**Intel AVX512**) Windows **10** / Visual Studio 2017 / Clang 7.0
-* (**Intel AVX512**) Windows **10** / Visual Studio 2017 / Clang 6.0
-* (**Intel AVX2**) Windows **10** / MinGW-W64 5.2 / clang version 3.8.0 (branches/release_38)
-* (**Intel AVX2**) Windows **10** / MinGW-W64 4.8 / clang version 3.8.0 (branches/release_38)
-* (**Intel AVX**) Windows **8.1** / MinGW-W64 5.4 / clang version 3.8.0 (branches/release_38)
-* (**Intel AVX**) Windows **8.1** / Visual Studio 2015 Update 2 / clang version 3.9.0 (SVN r273898 (27 June 2016))
-
-### Linux on Windows 10
-* (**Intel AVX2**) Windows **10.0.17134.407** compatible with Ubuntu **18.04** / gcc-7.x / clang version 7.0.0 (tags/RELEASE_700/final)
-* (**Intel AVX2**) Windows **10.0.14393** compatible with Ubuntu **14.04** / gcc-5.4.0 / clang version 3.8.0 (tags/RELEASE_380/final)
-
-## Planned for future versions
-
-* Parallel execution of algorithms
-* Serialization/Deserialization of any expression
-* More formats for audio file reading/writing
-
 ## License
 
 KFR is dual-licensed, available under both commercial and open-source GPL 2+ license.
 
-If you want to use KFR in commercial product or a closed-source project, you need to [purchase a Commercial License](https://kfr.dev/purchase-license)
+If you want to use KFR in a commercial product or a closed-source project, you need to [purchase a Commercial License](https://kfr.dev/purchase-license)
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -356,4 +356,4 @@ jobs:
       set PATH=%PATH:C:\Program Files\LLVM\bin;=%
       set PATH=%PATH:C:\Strawberry\c\bin;=%
       set JOBS=-j2
-      ci\run.cmd build-release -DKFR_ARCH_TESTS=sse2,sse42,avx,avx2,avx512 -DKFR_ARCH=sse2 -DKFR_ENABLE_DFT=OFF -DCMAKE_BUILD_TYPE=Release
+      ci\run.cmd build-release -DKFR_ARCH_TESTS=sse2,sse42,avx,avx2 -DKFR_ARCH=sse2 -DKFR_ENABLE_DFT=OFF -DCMAKE_BUILD_TYPE=Release
diff --git a/docs/README.md b/docs/README.md
@@ -1,39 +1 @@
-# KFR 5 Documentation
-
-## Getting started
-
-* [What's new in KFR 5](docs/whatsnew5.md)
-* [Data types](docs/types.md)
-* [Expressions](docs/expressions.md)
-
-## Guides
-
-### DSP
-* [How to apply a FIR filter](docs/fir.md)
-* [How to apply a Biquad filter](docs/bq.md)
-* [How to do Sample Rate Conversion](docs/src.md)
-* [How to apply Convolution Reverb](docs/conv_reverb.md)
-* [How to measure loudness according to EBU R 128](docs/ebur128.md)
-* [How to convert sample type](docs/conversion.md)
-* [How to normalize audio](docs/normalize.md)
-* [How to mix stereo channels](docs/convert_stereo.md)
-* [FIR filters code & examples](docs/fir_gallery.md)
-* [IIR filters code & examples](docs/iir_gallery.md)
-* [Biquad filters code & examples](docs/bq_gallery.md)
-* [Sample Rate Converter code & examples](docs/src_gallery.md)
-* [Window functions code & examples](docs/window_gallery.md)
-* [Convolution filter details](docs/convolution.md)
-
-### FFT
-* [How to apply Fast Fourier Transform](docs/dft.md)
-* [More about FFT/DFT](docs/dtf2.md)
-* [DFT data layout](docs/dft_format.md)
-
-### I/O
-* [How to read WAV file](docs/read_audio.md)
-* [File types support](docs/file_support.md)
-* [How to plot filter impulse response](docs/plot.md)
-
-## Function reference
-
-Function reference is generated from source files and  not included here, see https://kfr.dev/newdocs
+See [KFR Documentation Index](docs/index.md)
diff --git a/docs/cxxdox.yml b/docs/cxxdox.yml
@@ -16,6 +16,7 @@ clang:
     - '-DKFR_ENABLE_FLAC=1'
     - '-DCMT_FORCE_GENERIC_CPU=1'
     - '-std=gnu++17'
+    - '-DDOCUMENTATION'
 
 input_directory: ../include/kfr
 
@@ -26,8 +27,10 @@ repository: https://github.com/kfrlib/kfr/blob/{TAG}/include/kfr/{FILE}#L{LINE}
 groups:
   filter: "Filter API"
   cpuid: "Runtime CPU detection"
-  cometa: "CoMeta - metaprogramming"
-  testo: "Testo - unit test"
+  cometa: "CoMeta"
+  testo: "Testo - Unit testing"
+  univector: "Vector"
+  tensor: "Tensor"
   dft: "DFT"
   binary_io: "Generic IO"
   audio_io: "Audio IO"
@@ -61,6 +64,6 @@ groups:
   expressions: "Expressions"
   generators: "Generator expressions"
   random: "PRNG functions and expressions"
-  array: "Array functions"
+  reducing: "Reducing functions"
   utility: "Utility functions"
   
 \ No newline at end of file
diff --git a/docs/docs/basics.md b/docs/docs/basics.md
@@ -0,0 +1,334 @@
+# Basics
+
+To include all KFR modules use `kfr/all.hpp` header. Note that DFT and IO modules need linking appropriate static libraries, `kfr_dft` and `kfr_io`.
+
+```c++
+#include <kfr/all.hpp>
+```
+
+Alternatively, you can include only the modules you need:
+
+```c++
+#include <kfr/base.hpp> // Functions, expressions
+#include <kfr/dft.hpp> // DFT/DCT and convolution
+#include <kfr/dsp.hpp> // DSP, Filters etc
+#include <kfr/io.hpp> // Audio file reading
+```
+
+## Types
+
+### SIMD vector
+
+`vec` is a template class that contains 1 or more elements and lets functions operate them in a machine-efficient way.
+
+The class synopsis:
+
+```c++
+template <typename T, size_t N>
+struct alignas(...) vec
+{
+    using value_type = T;
+
+    // Constructors
+    // broadcast scalar to all elements
+    vec(T broadcasting);
+    // initialize all elements 
+    template <typename... Ts>
+    vec(Ts... elements);
+    // concatenate vectors (sum(Ns...) == N)
+    template <size_t... Ns>
+    vec(const vec<T, Ns>&... concatenation);
+    // implicit element type conversion
+    template <typename U>
+    vec(const vec<U, N>& conversion);
+    ...
+
+    // element is a proxy class to get/set specific element
+    struct element
+    {
+    };
+
+    // Functions to access elements, const-versions omitted
+    // access by index
+    element operator[](size_t index);
+    // access first element
+    element front();
+    // access last element
+    element back();
+
+    // return size
+    size_t size() const;
+};
+template <typename... T>
+vec(T&&...) -> vec<std::common_type_t<T...>, sizeof...(T)>;
+```
+
+!!! note
+    The class implementation is specific to the target cpu, so `vec` class definition resides in `kfr::CMT_ARCH_NAME` namespace. For avx2 architecture it's `kfr::avx2`. The architecture namespace is declared inline, you should not use it directly, `kfr::vec` as treated by compiler as an alias for `kfr::CMT_ARCH_NAME::vec`.
+
+
+Use can omit the template parameters and let compiler deduce them for you:
+```c++
+vec x{ 10, 5, 2.5, 1.25 }; // vec<double, 4>
+
+some_function(vec{ 3, 2, 1 }); // vec<int, 3>
+```
+
+Vectors can be nested. `vec<vec<float, 2>, 2>` is a valid declaration.
+
+### 1D array
+
+`univector` is a template class that, based on the template parameter, may hold data in heap like `std::vector`, in its storage like `std::array` or hold pointer to external data like `std::span`.
+
+#### `univector<T>`
+
+This specialization holds data in heap. Memory is automatically aligned.
+
+`univector<T>` is derived from `std::vector<T>` but with KFR own allocator that provides alignment for memory allocation and contains all member functions and constructors from `std::vector<T, ...>`.
+
+#### `univector<T, Size>`
+
+This specialization holds data in its storage. `univector<T, Size>` is derived from `std::array<T>`and contains all member functions and constructors from `std::array<T>` but is properly aligned.
+
+`Size` must not be zero.
+
+#### `univector<T, 0>`
+
+This specialization works like `std::span` from C++20 and holds only pointer and size to external memory. Data alignment is preserved and cannot be enforced.
+
+For all specializations data is always contiguous in memory.
+
+#### Alignment
+
+For SIMD operations to be effective, data should be aligned to 16, 32 or 64 bytes boundary. Default STL allocator cannot provide such alignment, so holding data in `std::vector<T>` with defalt allocator may be suboptimal.
+
+KFR has its own STL-compatible allocator `kfr::data_allocator` that aligns memory to 64-bytes boundary. Using it with STL containers may increase performance.
+
+!!! note
+    Define `KFR_USE_STD_ALLOCATION` macro to make `data_allocator` an alias for `std::allocator`. This makes `univector`s interchangeable with `std::vector`s
+
+#### Passing 1D data to KFR functions
+
+Many KFR functions, such as DFT, receive and return data through `univector` class. If it's possible, use `univector` in your code as a storage for all data that may be passed to KFR functions. But if you already have data and need to pass it to KFR, you may use `make_univector` function that constructs `univector<T, 0>` from the pointer and the size or from a STL-compatible container (if `data()` and `size()` is defined).
+
+```c++
+std::vector<float> data; // existing data, or std::array<N, float>
+
+float val = rms(make_univector(data)); // No data copy
+```
+
+```c++
+const float* data; // existing data
+size_t size;       // 
+
+float val = rms(make_univector(data, size)); // No data copy
+```
+
+```c++
+const float data[1024];
+
+float val = rms(make_univector(data)); // No data copy
+```
+
+#### Slice
+
+You can get subrange of an array using `slice` function defined in all specializations of `univector` class. 
+
+```c++
+univector<float, 100> v;
+// ...
+const float s1 = sum(v); // Sum all elements
+const float s2 = sum(v.slice(2, 50)); // Sum 50 elements starting from 2
+```
+
+Result of the call to `slice` is always `univector<T, 0>`, a reference to external data.
+Not tat the lifetime of the reference is limited to the lifetime of the original data.
+
+!!! note
+    `univector` class is also [Expression](expressions.md) and can be used whereever expression is required.
+
+### Tensor (Multidimensional array)
+
+`tensor` is a class that holds or references multidimensional data and provides 
+a way to access individual elements and perform complex operations on the data.
+
+The number of elements in each axis of the array is defined by its _shape_.
+The number of dimensions is fixed at compile time.
+
+Tensor class synopsis:
+
+```c++
+
+struct memory_finalizer;
+
+// T is the element type
+// Dims is the number of dimensions
+template <typename T, index_t Dims>
+struct tensor
+{
+    using value_type = T;
+    using shape_type = shape<Dims>;
+
+    // iterates through flattened array
+    struct tensor_iterator;
+    
+    // iterates nested arrays
+    struct nested_iterator;
+
+    // construct from external pointer, shape, strides and finalizer
+    tensor(T* data, const shape_type& shape, const shape_type& strides,
+           memory_finalizer finalizer);
+
+    // construct from external pointer, shape and finalizer with default strides
+    tensor(T* data, const shape_type& shape, memory_finalizer finalizer);
+
+    // construct from shape and allocate memory
+    tensor(const shape_type& shape);
+
+    // construct from shape, strides and allocate memory
+    tensor(const shape_type& shape, const shape_type& strides);
+
+    // construct from shape, allocate memory and fill with value
+    tensor(const shape_type& shape, T value);
+
+    // construct from shape, strides, allocate memory and fill with value
+    tensor(const shape_type& shape, const shape_type& strides, T value);
+    
+    // construct from shape, allocate memory and fill with flat list
+    tensor(const shape_type& shape, const std::initializer_list<T>& values);
+    
+    // initialize with braced list. defined for 1D tensor only
+    template <typename U>
+    tensor(const std::initializer_list<U>& values);
+
+    // initialize with nested braced list. defined for 2D tensor only
+    template <typename U>
+    tensor(const std::initializer_list<std::initializer_list<U>>& values);
+    
+    // initialize with nested braced list. defined for 3D tensor only
+    template <typename U>
+    tensor(const std::initializer_list<std::initializer_list<std::initializer_list<U>>>& values)
+
+    // shape of tensor
+    shape_type shape() const;
+    // strides
+    shape_type strides() const;
+    
+    pointer data() const;
+    size_type size() const;
+    bool empty() const;
+    tensor_iterator begin() const;
+    tensor_iterator end() const;
+
+    // access individual element by index
+    value_type& access(const shape_type& index) const;
+
+    // access individual element by list of indices
+    value_type& operator()(size_t... index) const;
+    
+    // return subrange, individual axis or slice
+    template <typename... Index>
+    tensor<T, ...> operator()(const Index&...) const;
+
+    // return flattened array, see Reshaping below
+    tensor<T, 1> flatten() const;
+    // return reshaped array, see Reshaping below
+    template <index_t dims>
+    tensor<T, dims> reshape(const shape<dims>& new_shape) const;
+
+    // convert multidimensional tensor to string
+    template <typename Fmt = void>
+    std::string to_string(int max_columns = 16, int max_dimensions = INT_MAX, std::string separator = ", ",
+                          std::string open = "{", std::string close = "}") const;
+};
+```
+
+Iteration is always goes from the first axis to the last axis.
+
+By default the last axis is contiguous in memory but it can be changed with custom `strides`.
+
+```c++
+tensor<double, 1> t1{ 1, 2, 3, 4, 5, 6 };
+tensor<double, 2> t2{ {1, 2}, {3, 4}, {5, 6} };
+tensor<double, 3> t3{ {{1}, {2}}, {{3}, {4}}, {{5}, {6}} };
+// Memory layout for all these tensors is: 1, 2, 3, 4, 5, 6
+```
+
+!!! important `const`-qualified tensors are writable. This is to make it possible to pass a writable subrange to function without converting to lvalue.
+
+Tensor behaves like a shared pointer to memory (possibly allocated outside tensor class, see [Constructing tensor from external data](#constructing-tensor-from-external-data)) with automatic reference counting. Copy and assignment increments internal counter and the internal pointer still references the original data. 
+
+_Important_: Writing to one shared copy will modify all other copies of the this tensor too.
+To get a deep copy call the `copy` member function:
+
+```c++
+tensor<float, 2> t = other;
+t = t.copy();
+```
+
+#### Reshaping
+
+`reshape` and `flatten` functions perform reshaping and return new tensor that shares data with the original tensor.
+
+Not every tensor may be reshaped to any shape. The total number of elements must be same before and after reshaping.
+
+Also, to be able to share data the original tensor must be contiguous. If this requirement isn't meet, `reshape` and `flatten` functions throw `kfr::logic_error` exception. 
+There are variants of these functions called `reshape_may_copy` and `flatten_may_copy` that return a new tensor that does not share data with the original tensor in that cases.
+
+#### Slicing
+
+To slice the original array the special value constructed by `trange`, `tstart`, `tstop` or `tall` functions should be passed to tensor's `operator()`.
+
+```c++
+constexpr tensor_range trange(std::optional<signed_index_t> start = std::nullopt,
+                              std::optional<signed_index_t> stop  = std::nullopt,
+                              std::optional<signed_index_t> step  = std::nullopt)
+{
+    return { start, stop, step };
+}
+```
+
+If `start` is nullopt, the slice starts from the first element (or the last one if step is negative). If `stop` is nullopt, the slice ends at the last element (or the first one if step is negative).
+If `step` is nullopt or omitted, the step will be equal to 1.
+
+`tstart(start)` and `tstart(start, step)` are equivalents of calling `trange(start, nullopt, nullopt)` and `trange(start, nullopt, step)` and used to return the range starting from the `start` along the given axis.
+
+`tstop(stop)` and `tstop(stop, step)` are equivalents of calling `trange(nullopt, stop, nullopt)` and `trange(nullopt, stop, step)` and used to return the range stopping at the `stop` along the given axis.
+
+`tall()` is equivalent of `trange(nullopt, nullopt, nullopt)` and used to return the whole range of the given axis.
+
+Examples:
+
+```c++
+tensor<double, 2> t1(shape{ 8, 6 });
+// initialize tensor
+t1 = counter(0, 10, 1);
+// t1 =
+// {{ 0,  1,  2,  3,  4,  5,  6,  7},
+//  {10, 11, 12, 13, 14, 15, 16, 17},
+//  {20, 21, 22, 23, 24, 25, 26, 27},
+//  {30, 31, 32, 33, 34, 35, 36, 37},
+//  {40, 41, 42, 43, 44, 45, 46, 47},
+//  {50, 51, 52, 53, 54, 55, 56, 57}}
+
+// slice tensor
+tensor<double, 2> t2 = t(tstart(2), trange(2, 4));
+// t2 =
+// {{22, 23, 24},
+//  {32, 33, 34},
+//  {42, 43, 44},
+//  {52, 53, 54}}
+```
+
+#### Constructing tensor from external data
+
+```c++
+tensor<float, 1> fn(std::vector<float>&& v)
+{
+    tensor<float, 1> t = tensor_from_container(std::move(v));
+    // no data copy is performed. v is being moved to finalizer
+    // and tensor references original vector data
+    return t;
+}
+```
+
diff --git a/docs/docs/capi.md b/docs/docs/capi.md
@@ -0,0 +1,27 @@
+# KFR C API
+
+## Building KFR C API
+
+Clang is required. See [Installation](installation.md)
+
+### Windows
+
+These commands must be executed in MSVC2019 command prompt.
+
+```bash
+cd <path_to_kfr_repository>
+mkdir build && cd build
+cmake -GNinja -DENABLE_CAPI_BUILD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="<PATH_TO_LLVM_DIR>/bin/clang-cl.exe" ..
+ninja kfr_capi
+```
+
+### Linux, macOS, other
+
+```bash
+cd <path_to_kfr_repository>
+mkdir build && cd build
+cmake -GNinja -DENABLE_CAPI_BUILD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ ..
+ninja kfr_capi
+```
+
+Optionally, you can install the binaries into your system using `ninja install`
diff --git a/docs/docs/conv_reverb.md b/docs/docs/conv_reverb.md
@@ -2,7 +2,7 @@
 
 ### Mono version
 
-Input/Output data: [See how to pass data to KFR](types.md)
+Input/Output data: [See how to pass data to KFR](basics.md)
 ```c++
 univector<float> audio;
 univector<float> impulse_response;
@@ -14,7 +14,7 @@ reverb.apply(audio);
 ```
 
 !!! note
-    `convolve_filter` uses [Filter API](filters.md) and preserves its internal state between calls to `apply`.
+    `convolve_filter` uses [Filter API](auto/filter.md) and preserves its internal state between calls to `apply`.
     Audio can be processed in chunks.
     Use `reset` function to reset its internal state.
 
diff --git a/docs/docs/convert_stereo.md b/docs/docs/convert_stereo.md
@@ -2,7 +2,7 @@
 
 ## L/R to M/S
 
-Input/output data. [See how to pass data to KFR](types.md)
+Input/output data. [See how to pass data to KFR](basics.md)
 ```c++
 univector<float> left;
 univector<float> right;
@@ -42,4 +42,4 @@ mono = (left + right) * 0.5f;
 or, depending on what results you want to get
 ```c++
 mono = left + right;
-```
-\ No newline at end of file
+```
diff --git a/docs/docs/dft.md b/docs/docs/dft.md
@@ -10,7 +10,7 @@ For power of 2 sizes, it uses Fast Fourier Transform.
 
 ### Complex input/output
 
-Apply the FFT to the complex input data: [See how to pass data to KFR](types.md)
+Apply the FFT to the complex input data: [See how to pass data to KFR](basics.md)
 
 ```c++
 // prepare data (0, 1, 2, ..., 255)
diff --git a/docs/docs/expressions.md b/docs/docs/expressions.md
@@ -6,12 +6,12 @@ Expression can have specific size or have infinite size in any dimension. Its si
 
 The number of dimensions must be known at compile time.
 
-Normally, expressions do not own any data and can be seen as _data generators_ with arbitrary algorithm. But classes owning data (`tensor` and `univector`) provide expression interface as well. Since KFR5 you can make expression from any user defined or `std` type.
+Normally, expressions do not own any data and can be seen as _data generators_ with any algorithm under the hood. But classes owning data (`tensor` and `univector`) provide expression interface as well. Since KFR5 you can make expression from any user defined or `std` type.
 
 Expressions can refer to other expressions as its arguments.
 prvalue expressions are captured by value and moved to expression storage.
 lvalue expressions are captured by reference. 
-The latter may cause dangling references if resulting expression is used outside of its arguments scope. As always, `std::move` forces variable to be captured by value.
+The latter may cause dangling references if the resulting expression is used outside of its arguments scope. As always, `std::move` forces variable to be captured by value.
 
 The following function creates Expression that represents a virtual 3-dimensional array with elements starting from 0 at $(0,0,0)$ index
 and incremented by $1$, $10$ and $100$ along each axis.
@@ -29,27 +29,19 @@ This allows better optimization and does not require saving temporary data.
 
 Internally a C++ technique called [Expression templates](https://en.wikipedia.org/wiki/Expression_templates) is used but expressions processing is explicitly vectorized in KFR. You can control some aspects of vectorization.
 
-## univector - 1D data with compatibility with std::array or std::vector
-
-`univector<T, tag>` contains 1D data.
-
-## tensor - multidimensional data
-
-`tensor<T, dims>` contains multidimensional data.
-
 ## Functions and operators
 
-For example, subtracting one univector from another gives expression type, not univector:
+For example, subtracting one univector from another produces expression, not univector:
 
 ```c++
 univector<int, 5> x{1, 2, 3, 4, 5};
 univector<int, 5> y{0, 0, 1, 10, -5};
 
-auto z = x - y; // z is of type expression, not univector. 
+auto z = x - y; // z is of type expression_function<...>, not univector. 
                 // This only constructs an expression and does not perform any calculation
 ```
 
-But you can always convert expression back to univector to get actual data:
+You should assign expression to a univector (or tensor) to get the data:
 
 ```c++
 univector<int, 5> x{1, 2, 3, 4, 5};
@@ -59,15 +51,14 @@ univector<int, 5> z = x - y;
 ```
 
 !!! note
-    when an expression is assigned to a `univector` variable, expression is evaluated
-    and values are being written to the variable.
+    when an expression is assigned to a `univector` variable, expression is evaluated in `process` function and values are being written to the target storage.
 
 Same applies to calling KFR functions on univectors, this doesn't calculate values immediately. Instead, new expression will be created.
 
 ```c++
 univector<float, 5> x{1, 2, 3, 4, 5};
 sqrt(x);                                // only constructs an expression
-univector<float, 5> values = sqrt(x);   // constructs an expression and writes data to univector  
+univector<float, 5> values = sqrt(x);   // constructs an expression and writes data to univector
 ```
 
 Element type of an input expressions can be determined by using `expression_value_type<Expr>` (In KFR4 it was `value_type_of<Expr>`). Since KFR5 all expressions have their type specified.
@@ -133,10 +124,11 @@ int main()
     // And 0 dimensions collapsed (to_string).
     println(trender(identity_matrix<float, 9>{}).to_string(16, 0));
 }
-
 ```
+
 **Output**:
-```
+
+```c++
 {{1, 0, 0, 0, 0, 0, 0, 0, 0},
  {0, 1, 0, 0, 0, 0, 0, 0, 0},
  {0, 0, 1, 0, 0, 0, 0, 0, 0},
@@ -170,7 +162,7 @@ struct identity_matrix : expression_traits_defaults
 
 ```
 
-Now with size defined at runtime.
+The same class with the size defined at runtime.
 
 ```c++
 template <typename T>
@@ -194,3 +186,13 @@ struct identity_matrix : expression_traits_defaults
 };
 
 ```
+
+### Reducing functions
+
+Reducing functions accept 1D [Expression](expressions.md) and produce scalar.
+
+Some of the reducing functions are:
+`sum`, `rms`, `mean`, `dotproduct`, `product`, `sumsqr`.
+
+Some of reducing functions have the same names as corresponding regular functions but with `of` suffix to distinguish them: 
+`minof`, `maxof`, `absmaxof`, `absminof`
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -1,62 +1,41 @@
-# KFR
-
-## Features
-
-* All code in the library is optimized for Intel, AMD (SSE2, SSE3, SSE4.x, AVX and AVX2 and AVX512) and ARM (NEON) processors
-* Mathematical and statistical functions
-* Template expressions (See examples)
-* All data types are supported including complex numbers
-* All vector lengths are also supported. `vec<float,1>`, `vec<unsigned,3>`, `vec<complex<float>, 11>` all are valid vector types in KFR
-* Most of the standard library functions are re-implemented to support vector of any length and data type and to support expressions 
-* Runtime cpu detection
-* C API: DFT, real DFT, DCT, FIR and IIR filters and convolution, memory allocation
-  * Built for SSE2, SSE4.1, AVX, AVX2, AVX512, x86 and x86_64, architecture is selected at runtime
-  * Can be used with any compiler and any language with ability to call C functions
-  * [Prebuilt Windows binaries](https://github.com/kfrlib/kfr/releases)
-
-### DSP/Audio algorithms
-
-* [FFT](dft.md)
-* DCT
-* [Convolution](convolution.md)
-* [FIR filtering](fir.md)
-* [FIR filter design using the window method](fir.md)
-* [Resampling with configurable quality](src.md) (See resampling.cpp from Examples directory)
-* IIR filter design
-  * Butterworth
-  * Chevyshev I and Chevyshev II
-  * Bessel
-  * Low pass, high pass, band pass, band stop
-  * Convert arbitrary filter from Z, P, K format to SOS suitable for biquad function
-* Goertzel algorithm
-* Fractional delay
-* [Biquad filtering](bq.md)
-* [Biquad design functions](bq.md)
-* Oscillators: Sine, Square, Sawtooth, Triangle
-* Window functions: Triangular, Bartlett, Cosine, Hann, Bartlett-Hann, Hamming, Bohman, Blackman, Blackman-Harris, Kaiser, Flattop, Gaussian, Lanczos, Rectangular
-* [Audio file reading/writing (wav, flac, mp3)](read_audio.md)
-* Pseudorandom number generator
-* Sorting
-* Color space conversion (sRGB, XYZ, Lab, LCH)
-* Ring (Circular) buffer
-* Simple waveshaper
-* Fast incremental sine/cosine generation
-* [EBU R 128](ebur128.md)
-
-## Installation
-
-[GitHub:kfrlib/kfr/blob/master/README.md](https://github.com/kfrlib/kfr/blob/master/README.md#usage)
-
-## Compiler support
-
-Xcode | Visual Studio | Clang | GCC | Intel Compiler
------ | ------------- | ----- | --- | --------------
-9+    | 2017          | 6+    | 7+  | Experimental
-
-Tested on macOS, Windows (MinGW, MSYS and MSVC), Linux, iOS, Android.
-
-## Architecture support
-
-x86, x86_64 | ARM, ARM64 |
------ | -------------
-Scalar, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512  | Scalar, NEON, NEON64
+# KFR 5 Documentation
+
+## Getting started
+
+* [What's new in KFR 5](whatsnew5.md)
+* [Installation](installation.md)
+* [Basics](basics.md)
+* [Expressions](expressions.md)
+* [C API](capi.md)
+
+## Guides
+
+### DSP
+* [How to apply a FIR filter](fir.md)
+* [How to apply a Biquad filter](bq.md)
+* [How to do Sample Rate Conversion](src.md)
+* [How to apply Convolution Reverb](conv_reverb.md)
+* [How to measure loudness according to EBU R 128](ebur128.md)
+* [How to convert sample type](conversion.md)
+* [How to normalize audio](normalize.md)
+* [How to mix stereo channels](convert_stereo.md)
+* [FIR filters code & examples](fir_gallery.md)
+* [IIR filters code & examples](iir_gallery.md)
+* [Biquad filters code & examples](bq_gallery.md)
+* [Sample Rate Converter code & examples](src_gallery.md)
+* [Window functions code & examples](window_gallery.md)
+* [Convolution filter details](convolution.md)
+
+### FFT
+* [How to apply Fast Fourier Transform](dft.md)
+* [More about FFT/DFT](dft2.md)
+* [DFT data layout](dft_format.md)
+
+### I/O
+* [How to read WAV file](read_audio.md)
+* [File types support](file_support.md)
+* [How to plot filter impulse response](plot.md)
+
+## Function reference
+
+* [Function reference](auto/refindex.md)
diff --git a/docs/docs/installation.md b/docs/docs/installation.md
@@ -0,0 +1,257 @@
+# Installation
+
+## Support
+
+KFR is tested and supported on the following systems and architectures:
+
+**OS** • Windows • Linux • macOS • iOS • Android
+
+**CPU** • x86 • x86_64 • ARM • ARM64 (AArch64)
+
+**x86 extensions** • SSE2 • SSE3 • SSSE3 • SSE4.1 • SSE4.2 • AVX • AVX2 • AVX512 • FMA
+
+**ARM extensions** • NEON
+
+**Compiler** • GCC7+ • Clang 9+ • MSVC2019+ • Xcode 10.3+
+
+## Prerequisites
+
+* CMake 3.10 or newer for building tests and examples
+* Python 3.6+ for running examples
+* (recommended) Ninja (https://ninja-build.org/) for faster builds
+
+For running examples and generating the frequency responses of the filters some python packages are required:
+
+```bash
+# in the kfr directory
+pip install -r requirements.txt
+```
+
+### Clang
+
+Clang is highly recommended and proven to provide the best performance for KFR. 
+
+#### Linux
+
+Install clang using your package manager and add the following defines to the cmake command line:
+
+```bash
+cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang ...
+```
+
+#### macOS
+
+On macOS clang is the default compiler and already included in the official Xcode toolchain. No additional setup required.
+
+#### Windows
+
+Download and install the latest `win64` build from the official LLVM GitHub page:
+
+https://github.com/llvm/llvm-project/releases
+
+## Getting the source code and binaries
+
+### Git (recommended)
+
+To obtain the full source code, including examples and tests, you can clone the git repository:
+
+```
+git clone https://github.com/kfrlib/kfr.git
+```
+
+The repository default branch `master` is stable and passes all tests. Latest features reside in `dev`.
+
+#### Update
+
+```bash
+# in the kfr directory
+git pull
+```
+
+### Tarball/zip
+
+Download the latest release package from the GitHub releases:
+
+https://github.com/kfrlib/kfr/releases
+
+#### Update
+
+Re-download tarball and unpack it to the same location.
+
+### vcpkg
+
+#### Linux/macOS
+```bash
+./vcpkg install kfr
+```
+
+#### Windows
+
+```cmd
+vcpkg install kfr
+```
+
+### ArchLinux Package
+KFR is available on the [ArchLinux User Repository](https://wiki.archlinux.org/index.php/Arch_User_Repository) (AUR).
+You can install it with an [AUR helper](https://wiki.archlinux.org/index.php/AUR_helpers), like [`yay`](https://aur.archlinux.org/packages/yay/), as follows:
+
+```bash
+yay -S kfr
+```
+To discuss any issues related to this AUR package refer to the comments section of
+[`kfr`](https://aur.archlinux.org/packages/kfr/).
+
+Prebuilt binaries will be available soon.
+
+## Usage
+
+### Including in CMake project
+
+`CMakeLists.txt` contains these libraries:
+* `kfr` - header only interface library
+* `kfr_dft` - static library for DFT and related algorithms
+* `kfr_io` - static library for file IO and audio IO
+
+```cmake
+# Include KFR subdirectory
+add_subdirectory(kfr)
+
+# Add header-only KFR to your executable or library, this sets include directories etc
+target_link_libraries(your_executable_or_library kfr)
+
+# Add KFR DFT to your executable or library, (cpp file will be built for this)
+target_link_libraries(your_executable_or_library kfr_dft)
+
+# Add KFR IO to your executable or library, (cpp file will be built for this)
+target_link_libraries(your_executable_or_library kfr_io)
+```
+
+### Makefile, command line etc (Unix-like systems)
+
+```bash
+# Add this to command line
+-Ipath_to_kfr/include
+
+# And this if needed
+-lkfr_dft -lkfr_io
+
+# C++17 mode must be enabled
+-std=c++17
+# or
+-std=gnu++17
+
+# linker options (requires kfr to be installed)
+-lkfr_dft -lkfr_io
+```
+
+### Linux
+
+#### Prerequisites
+
+* GCC 7 or newer
+* Clang 9.0 or newer (recommended)
+
+#### Command line
+
+```bash
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -DENABLE_TESTS=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ..
+make -- -j
+```
+Or using Ninja (better):
+```bash
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ..
+ninja
+```
+
+### macOS
+
+#### Prerequisites
+
+* Xcode 10.3 or later
+
+#### Command line
+Using Xcode project:
+```bash
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -GXcode -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
+cmake --build .
+```
+Using Unix Makefiles:
+```bash
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -G"Unix Makefiles" -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
+make -- -j
+```
+Or using Ninja (better):
+```bash
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
+ninja
+```
+
+### Visual Studio
+
+#### Prerequisites
+* Visual Studio 2019 or later
+* Latest Clang (https://llvm.org/)
+* Ninja is highly recommended because Visual Studio does not support parallel build with Clang at this moment.
+
+#### Visual Studio IDE
+
+To work with KFR in Visual Studio you must add the path to the `include` directory inside KFR directory to the list of the project's include directories.<br>
+More details:
+https://learn.microsoft.com/en-us/cpp/build/reference/vcpp-directories-property-page?view=msvc-160
+
+Make sure that LLVM toolset is set for the project.
+
+Download and install the official LLVM toolchain extension:
+https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain
+
+More details:
+https://docs.microsoft.com/en-us/cpp/ide/general-property-page-project?view=vs-2017
+
+LLVM/Clang has very good compatibility with MSVC ABI and is widely used for building large projects on Windows (including Chrome), so switching to LLVM/Clang should not cause any compatibility problems.
+
+#### Command line
+Using Ninja:
+```
+cd <path_to_kfr>
+mkdir build && cd build
+call "C:\<path to your Visual Studio installation>\VC\Auxiliary\Build\vcvars64.bat"
+cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" -DCMAKE_BUILD_TYPE=Release ..
+ninja
+```
+Or generate Visual Studio solution (building will be slower):
+```
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -G"Visual Studio 16 2019 Win64" -DENABLE_TESTS=ON -Tllvm -DCMAKE_BUILD_TYPE=Release ..
+```
+
+### MinGW/MSYS
+
+#### Prerequisites
+* Latest MinGW or MSYS2
+* Clang 9.0 or newer
+
+Using Makefiles:
+```
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
+make -- -j
+```
+Using Ninja:
+```
+cd <path_to_kfr>
+mkdir build && cd build
+cmake -GNinja -DENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
+ninja
+```
diff --git a/docs/docs/normalize.md b/docs/docs/normalize.md
@@ -2,7 +2,7 @@
 
 ## One channel
 
-Input/Output data: [See how to pass data to KFR](types.md)
+Input/Output data: [See how to pass data to KFR](basics.md)
 ```c++
 univector<float> audio;
 ```
@@ -45,4 +45,4 @@ unpack(stereo[0], stereo[1]) = pack(stereo[0], stereo[1]) / pack(peak, peak);
 1. `absmaxof` function calculates the absolute maximum of the concatenated arrays
 1. `pack` function combines two arrays as if it were a single array containing pairs of values
 1. `operator/` divides pairs by the peak value
-1. `unpack` function unpacks the pairs to two target arrays
-\ No newline at end of file
+1. `unpack` function unpacks the pairs to two target arrays
diff --git a/docs/docs/types.md b/docs/docs/types.md
@@ -1,81 +0,0 @@
-# Data types
-
-`univector` class is a base of all containers in KFR.
-
-`univector` can have both static and dynamic size and can even hold only reference to an external data (just like `array_view` or `string_view`)
-
-`univector<float>` is derived from `std::vector<float>` and contains all its member functions and constructors.
-
-`univector<float, 10>` is derived from `std::array<float, 10>` and contains all its member functions and constructors.
-
-`univector<float, 0>` is only reference to data and doesn’t contain any values.
-
-Such universal template allows functions in KFR to get data in any format.
-
-You can get subrange of an array using slice function:
-
-```c++
-univector<float, 100> v;
-// ...
-const float s1 = sum(v); // Sum all elements
-const float s2 = sum(v.slice(2, 50)); // Sum 50 elements starting from 2
-```
-Result of the call to slice is always `univector<T, 0>`, a reference to external data.
-
-The lifetime of the reference is limited to the lifetime of the original data.
-
-!!! note
-    `univector` class is also [Expression](expressions.md) and can be used whereever expression is required.
-
-## Pass data to KFR functions
-
-If you don't use `univector` for data representation, you can still pass the data to KFR functions and filters. 
-
-Examples (`rms` is used as an example function that takes `univector`):
-
-### `std::vector` or `std::array`
-```c++
-std::vector<float> data; // existing data, or std::array<N, float>
-
-float val = rms(make_univector(data)); // No data copy
-```
-
-### Plain pointer
-```c++
-const float* data; // existing data
-size_t size;       // 
-
-float val = rms(make_univector(data, size)); // No data copy
-```
-
-### array
-```c++
-const float data[1024];
-
-float val = rms(make_univector(data)); // No data copy
-```
-
-
-## Data Types
-
-Unsigned:
-
-``u8``, ``u16``, ``u32`` and ``u64``
-
-Signed:
-
-``i8``, ``i16``, ``i32`` and ``i64``
-
-Floating point:
-
-``f32`` and ``f64``
-
-Complex:
-
-``complex<f32>`` and ``complex<f64>``
-
-Vector:
-
-``vec<u8, 4>``, ``vec<f32, 3>``, ``vec<i64, 1>``, ``vec<complex<float>, 15>``, ``vec<u8, 256>``, ``vec<vec<int, 3>, 3>``
-
-You are not limited to sizes of SIMD registers and basic types.
-\ No newline at end of file
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -4,6 +4,18 @@ theme:
   features:
     - navigation.tabs
     - navigation.top
+  palette: 
+    # Palette toggle for light mode
+    - scheme: default
+      toggle:
+        icon: material/brightness-7 
+        name: Switch to dark mode
+
+    # Palette toggle for dark mode
+    - scheme: slate
+      toggle:
+        icon: material/brightness-4
+        name: Switch to light mode
     
 extra:
   search:
@@ -32,6 +44,7 @@ markdown_extensions:
   - pymdownx.superfences
   - pymdownx.highlight
   - pymdownx.details
+  - pymdownx.magiclink
   - pymdownx.tabbed:
       alternate_style: true
 
@@ -47,9 +60,12 @@ repo_url: https://github.com/kfrlib/kfr
 repo_name: KFR
 
 nav:
-  - index.md
-  - types.md
-  - expressions.md
+  - KFR:
+    - index.md
+    - installation.md
+    - basics.md
+    - expressions.md
+    - whatsnew5.md
   - DSP:
     - fir.md
     - bq.md
@@ -76,9 +92,6 @@ nav:
   - Reference:
     - auto/refindex.md
     - Math:
-      - auto/types.md
-      - auto/memory.md
-      - auto/conversion.md
       - auto/complex.md
       - auto/constants.md
       - auto/logical.md
@@ -90,12 +103,19 @@ nav:
       - auto/hyperbolic.md
       - auto/horizontal.md
       - auto/other_math.md
+    - Base:
+      - auto/types.md
+      - auto/univector.md
+      - auto/tensor.md
       - auto/expressions.md
       - auto/generators.md
+      - auto/reducing.md
       - auto/random.md
-      - auto/array.md
+      - auto/memory.md
+      - auto/conversion.md
+      - auto/sort.md
       - auto/utility.md
-    - DSP:      
+    - DSP:
       - auto/filter.md
       - auto/biquad.md
       - auto/fir.md
@@ -105,7 +125,7 @@ nav:
     - DFT:
       - auto/convolution.md
       - auto/dft.md
-    - IO:      
+    - IO:
       - auto/binary_io.md
       - auto/audio_io.md
       - auto/plotting.md
@@ -115,4 +135,3 @@ nav:
     - auto/cometa.md
 
   - kfr.dev: https://kfr.dev
-  - github.com/kfrlib/kfr: https://github.com/kfrlib/kfr
diff --git a/include/kfr/base/fraction.hpp b/include/kfr/base/fraction.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup types
+/** @addtogroup base
  *  @{
  */
 /*
@@ -149,4 +149,4 @@ struct representation<kfr::fraction>
             return as_string(value.numerator, "/", value.denominator);
     }
 };
-} // namespace cometa
-\ No newline at end of file
+} // namespace cometa
diff --git a/include/kfr/base/handle.hpp b/include/kfr/base/handle.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup expressions
+/** @addtogroup base
  *  @{
  */
 /*
diff --git a/include/kfr/base/math_expressions.hpp b/include/kfr/base/math_expressions.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup expressions
+/** @addtogroup base
  *  @{
  */
 /*
diff --git a/include/kfr/base/memory.hpp b/include/kfr/base/memory.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup memory
+/** @addtogroup base
  *  @{
  */
 /*
diff --git a/include/kfr/base/reduce.hpp b/include/kfr/base/reduce.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup array
+/** @addtogroup reducing
  *  @{
  */
 /*
diff --git a/include/kfr/base/shape.hpp b/include/kfr/base/shape.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup array
+/** @addtogroup types
  *  @{
  */
 /*
diff --git a/include/kfr/base/state_holder.hpp b/include/kfr/base/state_holder.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup fir
+/** @addtogroup filter
  *  @{
  */
 /**
diff --git a/include/kfr/base/tensor.hpp b/include/kfr/base/tensor.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup expressions
+/** @addtogroup tensor
  *  @{
  */
 /*
@@ -90,6 +90,12 @@ struct tensor_subscript<T, Derived, std::integer_sequence<index_t, Dims...>>
     }
 };
 
+/// @brief tensor holds or references multidimensional data and 
+/// provides a way to access individual elements and perform complex operations on the data.
+///
+/// The number of elements in each axis of the array is defined by its shape.
+/// @tparam T element type
+/// @tparam NDims number of dimensions
 template <typename T, index_t NDims>
 struct tensor : public tensor_subscript<T, tensor<T, NDims>, std::make_integer_sequence<index_t, NDims>>
 {
@@ -105,6 +111,7 @@ public:
 
     using shape_type = kfr::shape<dims>;
 
+    /// @brief Tensor iterator. Iterates through flattened array
     struct tensor_iterator
     {
         using iterator_category = std::forward_iterator_tag;
@@ -154,11 +161,13 @@ public:
     using contiguous_iterator       = pointer;
     using const_contiguous_iterator = pointer;
 
+    /// @brief Default constructor. Creates tensor with null shape
     KFR_MEM_INTRINSIC constexpr tensor()
         : m_data(0), m_size(0), m_is_contiguous(false), m_shape{}, m_strides{}
     {
     }
 
+    /// @brief Construct from external pointer, shape, strides and finalizer
     KFR_MEM_INTRINSIC tensor(T* data, const shape_type& shape, const shape_type& strides,
                              memory_finalizer finalizer)
         : m_data(data), m_size(size_of_shape(shape)),
@@ -167,6 +176,7 @@ public:
     {
     }
 
+    /// @brief Construct from external pointer, shape and finalizer with default strides 
     KFR_MEM_INTRINSIC tensor(T* data, const shape_type& shape, memory_finalizer finalizer)
         : m_data(data), m_size(size_of_shape(shape)), m_is_contiguous(true), m_shape(shape),
           m_strides(internal_generic::strides_for_shape(shape)), m_finalizer(std::move(finalizer))
@@ -177,6 +187,7 @@ public:
 
     KFR_INTRINSIC static void deallocate(T* ptr) { aligned_deallocate(ptr); }
 
+    /// @brief Construct from shape and allocate memory
     KFR_INTRINSIC explicit tensor(const shape_type& shape)
         : m_size(size_of_shape(shape)), m_is_contiguous(true), m_shape(shape),
           m_strides(internal_generic::strides_for_shape(shape))
@@ -185,6 +196,8 @@ public:
         m_data      = ptr;
         m_finalizer = make_memory_finalizer([ptr]() { deallocate(ptr); });
     }
+
+    /// @brief Construct from shape, strides and allocate memory
     KFR_INTRINSIC tensor(const shape_type& shape, const shape_type& strides)
         : m_size(size_of_shape(shape)),
           m_is_contiguous(strides == internal_generic::strides_for_shape(shape)), m_shape(shape),
@@ -194,15 +207,20 @@ public:
         m_data      = ptr;
         m_finalizer = make_memory_finalizer([ptr]() { deallocate(ptr); });
     }
+
+    /// @brief Construct from shape, allocate memory and fill with value
     KFR_INTRINSIC tensor(const shape_type& shape, T value) : tensor(shape)
     {
         std::fill(contiguous_begin_unsafe(), contiguous_end_unsafe(), value);
     }
 
+    /// @brief Construct from shape, strides, allocate memory and fill with value
     KFR_INTRINSIC tensor(const shape_type& shape, const shape_type& strides, T value) : tensor(shape, strides)
     {
         std::fill(begin(), end(), value);
     }
+
+    /// @brief Construct from shape, allocate memory and fill with flat list
     KFR_INTRINSIC tensor(const shape_type& shape, const std::initializer_list<T>& values) : tensor(shape)
     {
         if (values.size() != m_size)
@@ -210,23 +228,30 @@ public:
         std::copy(values.begin(), values.end(), contiguous_begin_unsafe());
     }
 
+    /// @brief Initialize with braced list. Defined for 1D tensor only
     template <typename U, KFR_ENABLE_IF(std::is_convertible_v<U, T>&& dims == 1)>
     KFR_INTRINSIC tensor(const std::initializer_list<U>& values) : tensor(shape_type(values.size()))
     {
         internal_generic::list_copy_recursively(values, contiguous_begin_unsafe());
     }
+    
+    /// @brief Initialize with braced list. Defined for 2D tensor only
     template <typename U, KFR_ENABLE_IF(std::is_convertible_v<U, T>&& dims == 2)>
     KFR_INTRINSIC tensor(const std::initializer_list<std::initializer_list<U>>& values)
         : tensor(shape_type(values.size(), values.begin()->size()))
     {
         internal_generic::list_copy_recursively(values, contiguous_begin_unsafe());
     }
+    
+    /// @brief Initialize with braced list. Defined for 3D tensor only
     template <typename U, KFR_ENABLE_IF(std::is_convertible_v<U, T>&& dims == 3)>
     KFR_INTRINSIC tensor(const std::initializer_list<std::initializer_list<std::initializer_list<U>>>& values)
         : tensor(shape_type(values.size(), values.begin()->size(), values.begin()->begin()->size()))
     {
         internal_generic::list_copy_recursively(values, contiguous_begin_unsafe());
     }
+    
+    /// @brief Initialize with braced list. Defined for 4D tensor only
     template <typename U, KFR_ENABLE_IF(std::is_convertible_v<U, T>&& dims == 4)>
     KFR_INTRINSIC tensor(
         const std::initializer_list<std::initializer_list<std::initializer_list<std::initializer_list<U>>>>&
diff --git a/include/kfr/base/univector.hpp b/include/kfr/base/univector.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup expressions
+/** @addtogroup univector
  *  @{
  */
 /*
diff --git a/include/kfr/cometa/memory.hpp b/include/kfr/cometa/memory.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup cometa
+/** @addtogroup memory
  *  @{
  */
 #pragma once
diff --git a/include/kfr/cometa/result.hpp b/include/kfr/cometa/result.hpp
@@ -25,7 +25,7 @@ struct result
 
     constexpr result(ErrEnum error) CMT_NOEXCEPT : m_error(error) {}
 
-    template <typename ValueInit, CMT_ENABLE_IF(is_constructible<value_type, ValueInit>)>
+    template <typename ValueInit, CMT_ENABLE_IF(std::is_constructible_v<value_type, ValueInit>)>
     constexpr result(ValueInit&& value) CMT_NOEXCEPT : m_value(std::forward<ValueInit>(value)),
                                                        m_error(OkValue)
     {
diff --git a/include/kfr/dsp/waveshaper.hpp b/include/kfr/dsp/waveshaper.hpp
@@ -28,6 +28,7 @@
 #include "../math/hyperbolic.hpp"
 #include "../simd/clamp.hpp"
 #include "../simd/operators.hpp"
+#include "../base/expression.hpp"
 
 namespace kfr
 {
diff --git a/include/kfr/dsp/weighting.hpp b/include/kfr/dsp/weighting.hpp
@@ -27,6 +27,7 @@
 
 #include "../math/sqrt.hpp"
 #include "../simd/operators.hpp"
+#include "../base/expression.hpp"
 
 namespace kfr
 {
diff --git a/include/kfr/simd/impl/backend_clang.hpp b/include/kfr/simd/impl/backend_clang.hpp
@@ -60,7 +60,7 @@ KFR_INTRINSIC simd<Tout, N> simd_make(ctype_t<Tout>, const Args&... args)
     return (simd<Tout, N>){ static_cast<unwrap_bit<Tout>>(args)... };
 }
 
-/// @brief Returns vector with undefined value
+// @brief Returns vector with undefined value
 template <typename Tout, size_t N>
 KFR_INTRINSIC simd<Tout, N> simd_undefined()
 {
@@ -68,21 +68,21 @@ KFR_INTRINSIC simd<Tout, N> simd_undefined()
     return x;
 }
 
-/// @brief Returns vector with all zeros
+// @brief Returns vector with all zeros
 template <typename Tout, size_t N>
 KFR_INTRINSIC simd<Tout, N> simd_zeros()
 {
     return Tout();
 }
 
-/// @brief Returns vector with all ones
+// @brief Returns vector with all ones
 template <typename Tout, size_t N>
 KFR_INTRINSIC simd<Tout, N> simd_allones()
 {
     return special_constants<Tout>::allones();
 }
 
-/// @brief Converts input vector to vector with subtype Tout
+// @brief Converts input vector to vector with subtype Tout
 template <typename Tout, typename Tin, size_t N, size_t Nout = (sizeof(Tin) * N / sizeof(Tout))>
 KFR_INTRINSIC simd<Tout, Nout> simd_bitcast(simd_cvt_t<Tout, Tin, N>, const simd<Tin, N>& x)
 {
@@ -158,14 +158,14 @@ KFR_INTRINSIC simd<T, N1 + N2 + Nscount> simd_concat(const simd<T, N1>& x, const
                         csizeseq<N1 + N2 + Nscount>, overload_auto);
 }
 
-/// @brief Converts input vector to vector with subtype Tout
+// @brief Converts input vector to vector with subtype Tout
 template <typename Tout, typename Tin, size_t N>
 KFR_INTRINSIC simd<Tout, N> simd_convert(simd_cvt_t<Tout, Tin, N>, const simd<Tin, N>& x)
 {
     return __builtin_convertvector(x, simd<Tout, N>);
 }
 
-/// @brief Converts input vector to vector with subtype Tout
+// @brief Converts input vector to vector with subtype Tout
 template <typename T, size_t N>
 KFR_INTRINSIC simd<T, N> simd_convert(simd_cvt_t<T, T, N>, const simd<T, N>& x)
 {
diff --git a/include/kfr/simd/impl/backend_generic.hpp b/include/kfr/simd/impl/backend_generic.hpp
@@ -22,6 +22,8 @@
  */
 #pragma once
 
+#ifndef CMT_CLANG_EXT
+
 #include "simd.hpp"
 
 CMT_PRAGMA_GNU(GCC diagnostic push)
@@ -1920,3 +1922,4 @@ KFR_INTRINSIC simd<T, Nout> universal_shuffle(simd_t<T, Nin>, const simd<T, Nin>
 } // namespace kfr
 
 CMT_PRAGMA_GNU(GCC diagnostic pop)
+#endif
diff --git a/include/kfr/simd/impl/basicoperators_complex.hpp b/include/kfr/simd/impl/basicoperators_complex.hpp
@@ -26,6 +26,7 @@
 #pragma once
 
 #include "../complex_type.hpp"
+#include "../operators.hpp"
 #include "../vec.hpp"
 
 namespace kfr
diff --git a/include/kfr/simd/sort.hpp b/include/kfr/simd/sort.hpp
@@ -1,4 +1,4 @@
-/** @addtogroup utility
+/** @addtogroup sort
  *  @{
  */
 /*
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,3 @@
+numpy
+scipy
+matplotlib

	kfr Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
	Log \| Files \| Refs \| README

A	KNOWNBUGS.md	\|	9	+++++++++
M	README.md	\|	395	++++++++++++++-----------------------------------------------------------------
M	azure-pipelines.yml	\|	2	+-
M	docs/README.md	\|	40	+---------------------------------------
M	docs/cxxdox.yml	\|	9	++++++---
A	docs/docs/basics.md	\|	334	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	docs/docs/capi.md	\|	27	+++++++++++++++++++++++++++
M	docs/docs/conv_reverb.md	\|	4	++--
M	docs/docs/convert_stereo.md	\|	5	++---
M	docs/docs/dft.md	\|	2	+-
M	docs/docs/expressions.md	\|	40	+++++++++++++++++++++-------------------
M	docs/docs/index.md	\|	103	++++++++++++++++++++++++++++++++-----------------------------------------------
A	docs/docs/installation.md	\|	257	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	docs/docs/normalize.md	\|	5	++---
D	docs/docs/types.md	\|	82	-------------------------------------------------------------------------------
M	docs/mkdocs.yml	\|	39	+++++++++++++++++++++++++++++----------
M	include/kfr/base/fraction.hpp	\|	5	++---
M	include/kfr/base/handle.hpp	\|	2	+-
M	include/kfr/base/math_expressions.hpp	\|	2	+-
M	include/kfr/base/memory.hpp	\|	2	+-
M	include/kfr/base/reduce.hpp	\|	2	+-
M	include/kfr/base/shape.hpp	\|	2	+-
M	include/kfr/base/state_holder.hpp	\|	2	+-
M	include/kfr/base/tensor.hpp	\|	27	++++++++++++++++++++++++++-
M	include/kfr/base/univector.hpp	\|	2	+-
M	include/kfr/cometa/memory.hpp	\|	2	+-
M	include/kfr/cometa/result.hpp	\|	2	+-
M	include/kfr/dsp/waveshaper.hpp	\|	1	+
M	include/kfr/dsp/weighting.hpp	\|	1	+
M	include/kfr/simd/impl/backend_clang.hpp	\|	12	++++++------
M	include/kfr/simd/impl/backend_generic.hpp	\|	3	+++
M	include/kfr/simd/impl/basicoperators_complex.hpp	\|	1	+
M	include/kfr/simd/sort.hpp	\|	2	+-
A	requirements.txt	\|	3	+++