History

fatedier 071cbf4b15 vendor: update		2018-05-09 01:05:14 +08:00
..
.gitignore	add packages	2017-10-25 02:29:04 +08:00
.travis.yml	add packages	2017-10-25 02:29:04 +08:00
LICENSE	add packages	2017-10-25 02:29:04 +08:00
README.md	vendor: update	2018-05-09 01:05:14 +08:00
matrix.go	add packages	2017-10-25 02:29:04 +08:00
rs.go	add packages	2017-10-25 02:29:04 +08:00
rs_amd64.go	add packages	2017-10-25 02:29:04 +08:00
rs_amd64.s	add packages	2017-10-25 02:29:04 +08:00
rs_other.go	add packages	2017-10-25 02:29:04 +08:00
tbl.go	add packages	2017-10-25 02:29:04 +08:00

README.md

Reed-Solomon

Introduction:

Reed-Solomon Erasure Code engine in pure Go.
Super Fast: more than 10GB/s per physics core ( 10+4, 4KB per vector, Macbook Pro 2.8 GHz Intel Core i7 )

Installation

To get the package use the standard:

go get github.com/templexxx/reedsolomon

Documentation

See the associated GoDoc

Specification

GOARCH

All arch are supported
0.1.0 need go1.9 for sync.Map in AMD64

Math

Coding over in GF(2^8)
Primitive Polynomial: x^8 + x^4 + x^3 + x^2 + 1 (0x1d)
mathtool/gentbls.go : generator Primitive Polynomial and it's log table, exp table, multiply table, inverse table etc. We can get more info about how galois field work
mathtool/cntinverse.go : calculate how many inverse matrix will have in different RS codes config
Both of Cauchy and Vandermonde Matrix are supported. Vandermonde need more operations for preserving the property that any square subset of rows is invertible

Why so fast?

These three parts will cost too much time:

lookup galois-field tables
read/write memory
calculate inverse matrix in the reconstruct process

SIMD will solve no.1

Cache-friendly codes will help to solve no.2 & no.3, and more, use a sync.Map for cache inverse matrix, it will help to save about 1000ns when we need same matrix.

Performance

Performance depends mainly on:

CPU instruction extension( AVX2 or SSSE3 or none )
number of data/parity vects
unit size of calculation ( see it in rs_amd64.go )
size of shards
speed of memory (waste so much time on read/write mem, :D )
performance of CPU
the way of using ( reuse memory)

And we must know the benchmark test is quite different with encoding/decoding in practice.

Because in benchmark test loops, the CPU Cache will help a lot. In practice, we must reuse the memory to make the performance become as good as the benchmark test.

Example of performance on my MacBook 2017 i7 2.8GHz. 10+4 (with 0.1.0).

Encoding:

Vector size	Speed (MB/S)
1400B	7655.02
4KB	10551.37
64KB	9297.25
1MB	6829.89
16MB	6312.83

Reconstruct (use nil to point which one need repair):

Vector size	Speed (MB/S)
1400B	4124.85
4KB	5715.45
64KB	6050.06
1MB	5001.21
16MB	5043.04

ReconstructWithPos (use a position list to point which one need repair, reuse the memory):

Vector size	Speed (MB/S)
1400B	6170.24
4KB	9444.86
64KB	9311.30
1MB	6781.06
16MB	6285.34

reconstruct benchmark tests here run with inverse matrix cache, if there is no cache, it will cost more time( about 1000ns)

Who is using this?

https://github.com/xtaci/kcp-go -- A Production-Grade Reliable-UDP Library for golang

Links & Thanks

Klauspost ReedSolomon
intel ISA-L
[GF SIMD] (http://www.ssrc.ucsc.edu/papers/plank-fast13.pdf)