CUBLAS_Library.pdf资源-CSDN下载

需积分: 43 45 浏览量 2019-10-08 22:26:03 上传评论 1 收藏 2.89MB PDF 举报

cuBLAS库是NVIDIA CUDA平台上的一个基础线性代数子程序库，它为开发者提供了操作NVIDIA GPU计算资源的能力。cuBLAS库的出现，对于CUDA开发者而言，是一个重要的工具，它基于CUDA运行时实现，并让使用者能够利用GPU进行高效的科学计算。自CUDA 6.0版本起，cuBLAS库提供了两套API：标准的cuBLAS API和CUBLASXT API。标准的cuBLAS API需要开发者在GPU内存空间中分配所需的数据（矩阵和向量），用数据填充这些数据结构，调用cuBLAS库函数，并最终将结果从GPU内存空间回传到主机内存。此外，cuBLAS API还提供了辅助函数，用于从GPU读写数据。 cuBLAS库使用列主序存储（column-major storage）方式，这种存储方式与Fortran环境的现有代码兼容性最好，使用的是基于1的索引方式。这意味着对于使用C或C++语言编写的程序而言，由于这些语言采用行主序存储（row-major storage），开发者不能直接使用原生的二维数组语义来表示矩阵。因此，需要定义宏或者内联函数来在一行数组上实现矩阵的功能。对于那些机械性地从Fortran移植到C的代码，可以选择保留基于1的索引方式以避免转换循环。在这种情况下，可以通过宏#define IDX2F(i, j, ld) ((((j)-1)*(ld))+((i)-1))来计算矩阵元素的数组索引，其中ld是指向矩阵的主维度，即列数。对于直接用C或C++编写的代码，开发者更倾向于选择基于0的索引方式，此时可以通过宏#define IDX2C(i, j, ld) (((j)*(ld))+(i))来计算数组索引。从cuBLAS库的4.0版本开始，库中新增了新旧API的区分。除了标准的cuBLAS API外，还引入了CUBLASXT API。后者在使用上有所不同，应用程序需要将数据保持在主机内存中，然后库函数会根据用户请求，负责将操作分派到系统中的一个或多个GPU上。CUBLASXT API特别适用于那些希望保持数据在主机上，并希望库自动管理GPU上数据分发的场景。在使用cuBLAS库时，开发者还需要注意数据布局的问题。在Fortran中使用cuBLAS时，应确保数据布局与库所期望的列主序存储一致，并且了解库函数如何根据数据布局来计算数组索引。而对于C/C++程序，需要根据选择的索引方式（基于0或基于1）来适当地处理矩阵数据。 cuBLAS库的设计允许开发者通过简单的API调用来利用GPU强大的计算性能，尤其在矩阵运算等科学计算领域中，可以大幅度提高运算速度和效率。无论是新的cuBLAS API还是CUBLASXT API，都为不同的应用场景提供了灵活的使用选项。对于CUDA平台上的线性代数计算，cuBLAS库成为了必不可少的工具之一。

资源推荐

资源详情

资源评论

CUBLAS LIBRARY

DU-06702-001_v9.0 | September 2017

User Guide

Introduction

www.nvidia.com

cuBLAS Library DU-06702-001_v9.0|2

indexing, in which case the array index of a matrix element in row “i” and column “j”

can be computed via the following macro

#define IDX2C(i,j,ld) (((j)*(ld))+(i))

1.2.New and Legacy cuBLAS API

Starting with version 4.0, the cuBLAS Library provides a new updated API, in addition

to the existing legacy API. This section discusses why a new API is provided, the

advantages of using it, and the differences with the existing legacy API.

The new cuBLAS library API can be used by including the header file “cublas_v2.h”. It

has the following features that the legacy cuBLAS API does not have:

‣

the handle to the cuBLAS library context is initialized using the function and is

explicitly passed to every subsequent library function call. This allows the user to

have more control over the library setup when using multiple host threads and

multiple GPUs. This also allows the cuBLAS APIs to be reentrant.

‣

the scalars and can be passed by reference on the host or the device, instead of

only being allowed to be passed by value on the host. This change allows library

functions to execute asynchronously using streams even when and are generated

by a previous kernel.

‣

when a library routine returns a scalar result, it can be returned by reference on

the host or the device, instead of only being allowed to be returned by value only

on the host. This change allows library routines to be called asynchronously when

the scalar result is generated and returned by reference on the device resulting in

maximum parallelism.

‣

the error status cublasStatus_t is returned by all cuBLAS library function calls.

This change facilitates debugging and simplifies software development. Note that

cublasStatus was renamed cublasStatus_t to be more consistent with other

types in the cuBLAS library.

‣

the cublasAlloc() and cublasFree() functions have been deprecated.

This change removes these unnecessary wrappers around cudaMalloc() and

cudaFree(), respectively.

‣

the function cublasSetKernelStream() was renamed cublasSetStream() to be

more consistent with the other CUDA libraries.

The legacy cuBLAS API, explained in more detail in the Appendix A, can be used by

including the header file “cublas.h”. Since the legacy API is identical to the previously

released cuBLAS library API, existing applications will work out of the box and

automatically use this legacy API without any source code changes. In general, new

applications should not use the legacy cuBLAS API, and existing existing applications

should convert to using the new API if it requires sophisticated and optimal stream

parallelism or if it calls cuBLAS routines concurrently from multiple threads. For the rest

of the document, the new cuBLAS Library API will simply be referred to as the cuBLAS

Library API.

As mentioned earlier the interfaces to the legacy and the cuBLAS library APIs are the

header file “cublas.h” and “cublas_v2.h”, respectively. In addition, applications using

the cuBLAS library need to link against the DSO cublas.so (Linux), the DLL cublas.dll

剩余180页未读，继续阅读

评论收藏

内容反馈

ArthurBreeze

粉丝: 5367

CUBLAS_Library.pdf

CUBLAS_Library_cublas_CUDA线性代数函数库文档_版本10.2_

CUBLAS_Library

CUBLAS库文档

CUBLAS文档

深度学习Tensorflow缺失cublasLt64-11.dll cusolver64-11.dll等dll文件

cublas64_80.dll

cudart64_92.dll

cublas64_90.dll cudart64_90.dll cudnn64_7.dll curand64_100.dll

cublas64_100.dll

CUBLAS_Library_2.0.pdf

资源管理的革命：Library.pdf

CUBLAS_Library.rar_GPU_cublas

Library of Basic Controls (LBC).pdf

TivaWare™ Sensor Library.pdf

library.pdf

cublas64_10.dll、cublas64_100.dll

cublas64_11.dll cublasLt64_11.dll cusolver64_11.dll cudart64_10

Could not load dynamic library &#039;cublas64_11.dll&#039;;此类报错的dll资源

Firefly-RK3399的Android10中的pdf_20211123_1657.7z

用于Tensorflow-gpu版本缺少的cublas64_11.dll等文件

cublas64_10.zip

curand64_80.dll

nvcuda.dll

cudart64_101.dll

Foxit Library-pdf 管理工具

cuFFT Library User's Guide.pdf

The Python Standard Library By Example.pdf

贾努斯

最新资源

Could not load dynamic library 'cublas64_11.dll';此类报错的dll资源