Releases: zeux/meshoptimizer
v0.15
This release focuses on gltfpack improvements and also features small improvements to simplifier to improve quality for some edge cases.
gltfpack highlights
gltfpack improves support for instanced meshes substantially in this release. While previously all instances of the same mesh would be merged together unconditionally which could result in large file sizes and/or memory consumption, by default the instances are kept as is now; -mm can be used to merge the geometry of the instances together, or alternatively -mi can be used to encode the instance data using EXT_mesh_gpu_instancing which, given a compatible loader, can significantly reduce the transmission size and improve loading and rendering performance.
To improve support for large scenes even further, gltfpack is now much more memory efficient, requiring ~40% less memory for processing on average.
The extension that's used by gltfpack to compress geometry, animation and instance data, is now part of glTF and is called EXT_meshopt_compression; gltfpack was changed accordingly to output compressed files conforming the up-to-date specification. This requires loaders to update to the new extension; https://github.com/zeux/meshoptimizer/tree/master/js contains plugins for three.js and Babylon.js and work is underway to integrate these directly upstream.
For texture compression, gltfpack is switching to toktx from KTX-Software; this enables support for super-compressed UASTC textures and support for texture scaling during encoding (via -ts option) which can further reduce the file size. Additionally when using toktx, gltfpack now pads the textures to a multiple of 4 to ensure compatibility with WebGL, and can optionally (via -tp option) pad to a power of 2 for older browsers. basisu command-line tool is still supported for now and automatically used if toktx is not available.
Finally, gltfpack is now available as a JS library in addition to having command-line executables; the library uses a filesystem-like interface. Please refer to gltf/library.js for documentation on the two exposed functions.
gltfpack improvements
- Improve support for scenes with many instances of the same mesh;
-mmis now required to merge these instances together - Implement support for
EXT_mesh_gpu_instancingvia-micommand line option -kmcan now be used to keep unused materials-kenow keepsextrason nodes in addition to materials- Improve memory consumption when packing large scenes by 40% on average
- node.js version of gltfpack now supports texture compression if
basisuortoktxare available - Update KTX2 support to track latest KTX2 specification, including DFD changes for ETC1S/UASTC
- Implement support for various PBR.Next extensions including
KHR_materials_transmission,KHR_materials_ior,KHR_materials_specularandKHR_materials_sheen - Implement support for
toktxwhen compressing textures - Implement support for
-tsthat can be used to rescale textures to reduce transmission and memory size - Instead of using 1-255 range for texture quality,
-tqnow accepts a level from 1 to 10, which is tuned to balance compression ratio vs quality for both ETC1S and UASTC - Fix processing for files with unused texture coordinate 0
- Implement support for
-tpthat can be used to rescale images to power-of-two when using texture compression - Remove command line option
-tbin favor of-tc; the latter usesKHR_texture_basisuwhich should be more widely supported - Remove command line option
-te; textures are now automatically embedded into.glbfiles - Implement JSON report via
-roption which contains various stats about the resulting glTF scene - Fix texture embedding for images with spaces in the URI
- Fix issues with non-uniform and negative mesh scale
- Implement support for multiple scenes; all scenes are now preserved along with their own node hierarchy
- Implement support for higher bitrate colors via
-vcoption - Fix animation range in some cases, in particular starting time is now preserved when it's not 0, and ending time is preserved when animation doesn't have motion
Miscellaneous improvements
- Improve
meshopt_simplifyedge analysis to track edge loops more carefully; this fixes simplification for some cases where an open border would previously get collapsed incorrectly - Fix a few issues with CMake configuration when meshoptimizer is used as a dependent library
- Fix compilation for old Apple Clang versions
- Reduce size of
meshopt_decoder.jsby 40% before gzip and 5% after gzip meshopt_decoder.jsnow has an ES6-friendly variant,meshopt_decoder.module.js, that can be imported.
v0.14
This release features several new algorithms, mainly aimed at improving the geometry compression, as well as many gltfpack changes with the same goal.
New algorithms
meshopt_optimizeVertexCacheStripoptimizes triangle lists for vertex cache, favoring long triangle strips over vertex transform efficiency. This function is recommended to use as a replacement formeshopt_optimizeVertexCachewhen reducing the compressed geometry size is more valuable than reducing vertex transform cost, or when usingmeshopt_stripifyto produce shorter triangle strip sequences.meshopt_encodeIndexBuffernow supports the new strip-optimized order better; this required some bitstream changes that can be enabled withmeshopt_encodeIndexVersion(1). Version 1 will become the default encoding version in a later release.meshopt_encodeIndexSequencecan be used to compress index buffer data that doesn't represent triangle lists; the encoding is recommended for triangle strip or line lists, but can work with any index sequence (it's less efficient thanmeshopt_encodeIndexBufferat compressing triangle lists)
When compressing geometry, using meshopt_optimizeVertexCacheStrip and meshopt_encodeIndexVersion(1) is recommended to minimize the distribution size of the resulting meshes; this can make the encoded data ~10% smaller before gzip/zstd compression and up to 20% smaller after gzip/zstd.
Additionally, a set of vertex filters (meshopt_decodeFilterOct, meshopt_decodeFilterQuat, meshopt_decodeFilterExp) was added to support MESHOPT_compression glTF extension; these are not as useful outside of glTF, and are described in detail in the extension draft. Cumulatively these can substantially reduce the geometry and animation data in glTF files compressed using the extension.
gltfpack highlights
gltfpack incorporates the new algorithms and filters to substantially improve the compression ratios for geometry and animation data. For example, Corset model from glTF-Sample-Models repository is 20% smaller, BrainStem model from the same repository is 30% smaller. Most of the changes currently require using a higher compression mode, activated via -cc command-line option; in a future release -cc may replace -c.
The texture compression support was updated to incorporate latest changes in KTX2 / KHR_texture_basisu specification; additionally, gltfpack now supports Basis UASTC encoding via -tu flag. Note that since gltfpack doesn't support UASTC RDO yet, the UASTC compressed files will be much larger (but much higher quality) compared to ETC1S encoded files.
For easier distribution, gltfpack is now available as an npm package.
gltfpack improvements
- Support all primitive topology modes, except indexed point lists, as an input
- Support for line lists as an output; line meshes were previously discarded
- Improve filtering of redundant geometry streams (removing color/morph delta streams as necessary)
- Implement support for
KHR_materials_clearcoatextension - Preserve
extrasdata on material instances when-keflag is used - Add fine-grained control over quantization parameters for animations (
-at,-ar,-as) - Add
-noqoption that can be used to disable quantization (resulting in much larger files) - Improve performance on large scenes with lots of mesh instances
- Improve validation and error messages for invalid input files
- Fix invalid output for files with meshes that don't produce any geometry
Miscellaneous improvements
meshopt_decodeVertexBuffernow automatically enables SSSE3 SIMD implementation for clang/gcc using__cpuid-based runtime detection without the need to use extra compile flagsmeshopt_encodeVertexBuffernow works correctly on empty inputs (count = 0)- CMake scripts now support CMake versions older than 3.7
- CMake options are now prefixed with
MESHOPT_(note: this breaks shared library builds, fixed in #129)
v0.13
This release has several new algorithms, SIMD improvements for vertex codec and a lot of gltfpack changes including Basis support.
New algorithms
meshopt_simplifyPointscan be used to simplify point clouds. The algorithm is a variant of sloppy simplifier, which means it's fast and not attribute-aware (for now).meshopt_spatialSortRemapandmeshopt_spatialSortTrianglescan be used to reorder vertices or triangles to increase spatial locality. This is helpful when working with point clouds and triangle meshes with redundant connectivity, and can improve clusterization results.
Performance improvements
meshopt_decodeVertexBuffernow has an experimental AVX512 implementation, which is ~10% faster than SSSE3 implementation (it uses 128b vectors and as such carries no extra power cost). It requires AVX512-VBMI2 and AVX512-VL (available on Ice Lake CPUs).meshopt_decodeVertexBuffernow has an experimental WebAssembly SIMD implementation, which is ~3x faster than scalar implementation. It requires a compatible WebAssembly implementation with SIMD enabled (Chrome Canary was used for testing).- WebAssembly decoders are now compiled using upstream Emscripten compiler backend, which results in ~5% faster decoding across the board.
Miscellaneous improvements
- All allocations now use allocation callbacks that can be set through
meshopt_setAllocators; previously, allocations frommeshopt_IndexAdapterwere using global operator new/delete. - CMake build system now supports BUILD_SHARED_LIBS
- CMake build system now can install gltfpack and libmeshoptimizer upon request
gltfpack highlights
This change includes a lot of work on extension specification. As a result, MESHOPT_quantized_geometry extension that was being used before got replaced with a new KHR_mesh_quantization extension (extension PR), and the details of MESHOPT_compression extension have changed substantially to allow for fallback data (extension PR), requiring updates to GLTF loaders. Both three.js (r111) and Babylon.JS (4.1) can be used to load these files, with a custom demo/GLTFLoader.js for three.js and an extension demo/babylon.MESHOPT_compression.js for Babylon.JS.
As a result, gltfpack-produced files now validate cleanly with the most recent glTF validator build (2.0.0-dev.3.0 (November 2019)).
gltfpack also now supports Basis Universal texture supercompression. Encoding files with these textures requires basisu executable which can be built from the official repository. Two container format options are provided:
.basis- native container format for Basis; this is supported by three.js and Babylon.JS today, but is likely to be removed in the future because this is not compatible with glTF specification.ktx- KTX2 container format from Khronos that supports Basis supercompression; this is not supported by any renderer at the time of this writing, but this is the route that is being specified (spec PR).
In addition, there were a lot of changes aimed at increasing efficiency and extending feature support, with the full list below.
gltfpack improvements
- Switch from MESHOPT_quantized_geometry to KHR_mesh_quantization
- gltfpack-produced files now validate cleanly with the most recent build of glTF validator (PR)
- Update
MESHOPT_compressionspecification, requires updating JSON loaders (GLTFLoader.js) - Implement support for arbitrary number of input bone influences (largest 4 weights are preserved)
- Implement degenerate triangle filtering (5% triangle/size savings on some models)
- Use 8-bit morph target deltas when possible (depending on the model, up to 2x memory savings, ~3% size savings); requires three.js r111 to work correctly
- Add
-cfcommand line option to support compressed data fallback; files produced with this option don't requireMESHOPT_compressionextension, but loaders that support it will not need to load uncompressed data - Add
-si Rand-saflags that simplify the meshes using default/aggressive (sloppy) simplification - By default, gltfpack now produces normalized normals/tangents; this results in larger but specification-compliant files. This will be improved later, for now you can use
-vuto get better compression by using unnormalized normals/tangents. - Impement support for Basis / KTX2 compression (
-tbto compress textures usingbasisuinto.basiscontainer;-tcto compress textures using KTX2 container which requires extra extensions and isn't supported by renderers yet) - Implement support for embedding texture files into buffers (
-teflag) - Implement support for point clouds
- Improve animation compression efficiency for translation/scale data by reducing output precision slightly.
- Improve efficiency of bone influence encoding (~1% size savings)
- A few correctness fixes, including non-uniform scale handling and quantized color/weight data parsing
- Morph target names are now preserved using
extra.targetNamesJSON array
v0.12
This release contains a few improvements for various algorithms, introduces support for triangle strips with degenerate triangles and adds gltfpack (alpha).
Interface changes:
meshopt_stripifyandmeshopt_unstripifynow require an extra argument,restart_index
Improvements:
- Improve
meshopt_simplifySloppyperformance by up to 10% by using three-point interpolation search - Improve results of
meshopt_optimizeVertexCacheby up to 0.5% by using a new data set obtained with differential evolution meshopt_stripifynow supports stitching strips using degenerate triangles instead of restart indices; this typically results in a 10% larger index buffer compared to restart indices, but on some GPUs it can be substantially faster to render
gltfpack:
This release introduces an alpha vesion of gltfpack. gltfpack is a command-line tool that converts .obj or .gltf files to glTF files that are optimized for render performance and transmission time. gltfpack merges meshes and materials to reduce draw call count, merges buffers to reduce draw setup cost, quantizes vertex attributes to reduce GPU memory footprint, optimizes vertex and index data for more efficient GPU rendering, resamples and quantizes animation data to reduce memory footprint, and can optionally compress the vertex/index/animation buffers in the output using meshoptimizer codecs to further reduce the file size.
The resulting files rely on two not-yet-standardized extensions; when compression is not used, the resulting files can be loaded using three.js (r107+) and Babylon.js (4.1+) glTF loaders. Loading compressed files requires integrating JavaScript decoders (js/meshopt_decoder.js); demo/GLTFLoader.js contains a custom version of three.js loader that can be used to load them.
v0.11
This release contains a few improvements for simplifier, introduces a new simplification algorithm, adds support for custom allocators and improves performance and code size of JavaScript decoders.
Interface changes:
meshopt_computeMeshletBoundsnow passesmeshletparameter by pointer instead of by value.
New algorithms:
- Introduce a new simplification algorithm,
meshopt_simplifySloppy, that performs decimation without concerns for topological integrity. The algorithm can and will merge small disjoint features together, and is extremely fast at ~20M triangles/sec on large meshes on modern desktop CPUs. - Memory allocation can now be configured to use custom allocation callbacks using
meshopt_setAllocator.
Improvements:
- Default simplifier now uses normalized error metric, which makes it much easier to consistently configure
target_errorparameter - it now corresponds to linear error, normalized to mesh radius (0.01 means 1% deviation). - Fix edge cases when default simplifier could run many passes in vain, resulting in poor performance.
- Improve JavaScript decoder performance: vertex decoding is 17% faster, index decoding is 1.7x faster.
- Improve JavaScript decoder size:
decoder.jsis now 2.4x smaller (3.5 KB after gzip)
Compatibility:
- Fix gcc -Wshadow warnings
- Work around a bug in Edge ChakraCore compiler that could result in indices being incorrectly decoded with
decoder.js.
v0.10
This release contains a number of fixes and improvements for vertex codec, substantially improves performance of several algorithms in Debug builds and introduces support for decompressing vertex/index data from JavaScript.
New algorithms:
- Introduce an experimental algorithm,
meshopt_generateVertexRemapMulti, that generates the same remap table asmeshopt_generateVertexRemapfor indexing a mesh, but supports vertex data stored as multiple independent streams (deinterleaved) - Introduce an experimental algorithm,
meshopt_generateShadowIndexBufferMulti, that can generate a second index buffer that shares the vertex data with the original index buffer, but supports vertex data stored as multiple independent streams (deinterleaved)
Improvements:
- Optimize NEON code in
meshopt_decodeVertexBuffer, making it 1-2% faster - Improve compatibility of SIMD code in
meshopt_decodeVertexBuffer, fixing compilation issues on ARM64, MSVC ARM, and clang for Windows - Fix a bug in
meshopt_encodeVertexBufferthat resulted in incorrectly encoded data on platforms wherecharisunsigned(this mostly affected ARM hosts such as Android) - Substantially improve performance of multiple algorithms in Debug:
meshopt_analyzeVertexCacheis 6x fastermeshopt_optimizeVertexCacheis 4.7x fastermeshopt_analyzeOverdrawis 3.9x fastermeshopt_optimizeOverdrawis 1.4x fastermeshopt_simplifyis 1.3x faster
JavaScript support:
- Introduce
js/decoder.jsthat contains a WebAssembly version of vertex and index decoders with a JavaScript-friendly interface. The decoders run at 200-400 MB/s on modern desktop CPUs. - Introduce
tools/OptMeshLoader.jsthat contains an example mesh loader for THREE.js that uses vertex/index codecs for compression and quantizes vertex data for efficient storage; the meshes for this loader can be produced bytools/meshencoder.cppusing .OBJ files as an input.
v0.9
This release substantially improves mesh simplification and introduces experimental algorithms for advanced GPU mesh rendering (cone culling, meshlet construction). The library can also now be used from Rust via https://crates.io/crates/meshopt.
Interface changes:
meshopt_simplifyhas an extra argument,target_error, that can be used to limit the geometric error introduced by the simplifier
New algorithms:
- Introduce an experimental algorithm,
meshopt_buildMeshlets, that can create meshlet data from index buffer that can be used to efficiently drive the mesh shading pipeline in NVidia RTX GPUs - Introduce experimental algorithms,
meshopt_computeClusterBoundsandmeshopt_computeMeshletBounds, that can compute bounding sphere and bounding normal cone for use in GPU cluster culling. - Introduce an experimental algorithm,
meshopt_generateShadowIndexBuffer, that can generate a second index buffer that shares the vertex data with the original index buffer, but is more efficient when a subset of vertex attributes is needed.
Improvements:
- Significantly rework
meshopt_simplifyto improve simplification quality, including error metric improvements, attribute-guided collapse that preserves UV seam structure better, and other tweaks - Significantly rework and optimize
meshopt_simplify, making it ~4x faster - Optimize
meshopt_generateVertexRemap, making it 1.25x faster - Optimize
meshopt_decodeVertexBufferfor platforms without SIMD support, making it 1.1x faster - Fix undefined behavior (left shift of negative integer) in
meshopt_encodeVertexBuffer
v0.8
This release introduces vertex buffer encoder and a stable version of index buffer encoder.
New algorithms:
- Introduce vertex encoder that compresses vertex buffers; it can be invoked using
meshopt_encodeVertexBufferandmeshopt_decodeVertexBuffer. The algorithm typically provides 1.5-2x compression ratio for quantized vertex data, and the resulting data can be compressed further by a general purpose compressor like zstd. Decoding is highly optimized using SSSE3/NEON and runs at 2 GB/s on a modern desktop CPU. - Introduce a stable index encoder that compresses index buffers; it can be invoked using
meshopt_encodeIndexBufferandmeshopt_decodeIndexBuffer. The algorithm typically encodes index buffers using ~3-4 bits per index, and the resulting data can be compressed further by a general purpose compressor like zstd, yielding ~2-3 bits per index for most meshes. Decoding is highly optimized and runs at 2 GB/s on a modern desktop CPU for 32-bit indices (1 GB/s for 16-bit indices). - Introduce a new algorithm to optimize for vertex fetch,
meshopt_optimizeVertexFetchRemap; it generates a remap table that can be used withmeshopt_remapVertexBuffer/meshopt_remapIndexBufferand helps optimizing meshes with several vertex streams.
Improvements:
- Optimize cluster sorting in
meshopt_optimizeOverdraw, making the function 10% faster - Optimize index decoder, making it 15% faster for 32-bit indices and 40% faster for 16-bit indices
- Fix
meshopt_analyzeVertexCacheandmeshopt_analyzeVertexFetchresults for sparse vertex buffers (with unused vertices) - Support in-place optimization in
meshopt_remapVertexBuffer - Improve CMake build files to make the library easier to integrate
v0.7
This release has large interface changes and introduces several new algorithms and tweaks to existing algorithms.
Interface:
- All C++ function wrappers have been moved out of
meshoptnamespace and gainedmeshopt_prefix to simplify documentation & interface - All structs used by the interface have been renamed and now also have
meshopt_prefix to avoid name conflicts meshopt_quantizeXfunctions now use function arguments instead of template parameters for better compatibilitycache_sizeargument has been removed frommeshopt_optimizeVertexCacheandmeshopt_optimizeOverdraw; to perform optimization for a FIFO cache of a fixed size, usemeshopt_optimizeVertexCacheFifo
New algorithms:
- Introduce an algorithm that compresses index buffers; it can be invoked using
meshopt_encodeIndexBufferandmeshopt_decodeIndexBuffer. The algorithm typically encodes index buffers using ~3-4 bits per index, and the resulting data can be compressed further by a general purpose compressor like zstd, yielding ~2-3 bits per index for most meshes. - Introduce an algorithm that can convert an index buffer to a triangle strip that is still reasonably cache efficient; indexed triangle strips are faster to render on some hardware and can reduce the index buffer size. The algorithm can be invoked using
meshopt_stripifyand typically produces buffers with around 60-65% indices compared to triangle lists, and a 5-10% ACMR penalty on GPUs with small caches. - Introduce a new quantization function,
meshopt_quantizeFloat, that can reduce the precision of a floating-point number while keeping the floating-point representation. This can be useful to generate vertex data that can be compressed more effectively using a general purpose compression algorithm.
Improvements:
- Overdraw analyzer (
meshopt_analyzeOverdraw) now uses a pixel center fill convention to match hardware rendering more closely. - Vertex cache analyzer (
meshopt_analyzeVertexCache) now models cache that matches real hardware a bit more closely, and requires additional parameters to configure (namely, primitive group size and warp/wavefront size). - Vertex cache optimizer (
meshopt_optimizeVertexCache) has been tuned to generate better output that performs well on real hardware, especially given meshes that have topology similar to that of a uniform grid as an input. - Various algorithms have been optimized for performance and memory consumption.
v0.6
This release has significant interface changes and introduces several new algorithms and tweaks to existing algorithms.
Interface:
- The library now has a C89 interface;
meshoptimizer.hpphas been renamed tomeshoptimizer.haccordingly. Templated functions are still available in namespacemeshoptfor C++. optimizeVertexFetch,optimizeOverdrawandanalyzeOverdrawparameter order has changed - make sure to revise existing calls to these functions.
New algorithms:
- Introduce alternative vertex cache optimizer based on Tom Forsyth's algorithm; it can be invoked by setting
cache_sizeparameter ofoptimizeVertexCacheto 0. It generally takes ~3x longer to optimize meshes but usually produces more efficient output with the exception of regular grids. - Introduce mesh simplification algorithm based on edge collapses, see
meshopt::simplify. This is an early version of the algorithm - expect to see performance and quality improvements in future versions.
Fixes:
remapVertexBuffernow correctly handles indexed vertex buffers where some vertices are not referencedoptimizeOverdrawnow correctly handles index buffers with degenerate trianglesoptimizeOverdrawis able to preserve the vertex cache efficiency much better