qinv 是 A 的主对角线上元素的倒数构成的 n 维向量 QmA 是 A 的主对角线为零, 其余元素取相反数构成的 n * n 矩阵 for (i = 1; i <= limit; i++) { y <- x; x <- b; gemv(1, QmA, y, x) // x <- QmA * y + x x <- x * qinv // 非标准算子, 向量按位乘法
delta <- x axpy(-1, y, delta) // delta <- -1 * y + delta norm = nrm2(delta) if (norm < eps) { result <- x return true; } } return false;
minusU 是 A 的上三角不含对角线, 取相反数, 其余元素为零 for (int i = 1; i <= limit; i++) { y <- x trmv(UPPER_WITH_DIAG, minusU, x) // x <- minusU * x axpy(1, b, x) // x <- b + x trsv(LOWER_WITH_DIAG, A, x) // Solve At = x for t, x <- t if (omega != 1) { scal(omega, x) // x <- omega * x axpy(1 - omega, y, x) // x <- (1 - omega) * y + x }
delta <- x axpy(-1, y, delta) // delta <- -1 * y + delta norm = nrm2(delta) if (norm < eps) { result <- x return true; } } return false;
具体的 CUDA C++ 实现请参考代码文件 lsim.cu. 求解上面的 3x3 矩阵, 输出
1 2 3 4 5 6 7 8 9 10
Testing small matrix Converged in 19 iterations -1.24489744 -0.57142823 1.16326494 Jacobi: 12.536ms Converged in 12 iterations -1.24489786 -0.57142853 1.16326528 GaussSeidel: 2.285ms Converged in 10 iterations -1.24489786 -0.57142858 1.16326529 SOR: 2.782ms
SOR 迭代取 . 观察到三种算法都收敛到了期望的值.
求解更大的矩阵, 例如将下面的矩阵扩大到 , 简单取
1 2 3 4 5 6 7 8 9 10
Testing large matrix n=10000 Converged in 140 iterations ok Jacobi: 570.889ms Converged in 78 iterations ok GaussSeidel: 543.641ms Converged in 70 iterations ok SOR: 528.95ms