Kernel programming

This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.

This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.

Indexing and dimensions

Metal.thread_position_in_grid_1dFunction
thread_position_in_grid_1d()::UInt32
thread_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
thread_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the current thread's position in an N-dimensional grid of threads.

source
Metal.thread_position_in_threadgroup_1dFunction
thread_position_in_threadgroup_1d()::UInt32
thread_position_in_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
thread_position_in_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the current thread's unique position within a threadgroup.

source
Metal.threadgroup_position_in_grid_1dFunction
threadgroup_position_in_grid_1d()::UInt32
threadgroup_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threadgroup_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the current threadgroup's unique position within the grid.

source
Metal.threadgroups_per_grid_1dFunction
threadgroups_per_grid_1d()::UInt32
threadgroups_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threadgroups_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the number of threadgroups per grid.

source
Metal.threads_per_grid_1dFunction
threads_per_grid_1d()::UInt32
threads_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threads_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the grid size.

source
Metal.threads_per_threadgroup_1dFunction
threads_per_threadgroup_1d()::UInt32
threads_per_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threads_per_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the thread execution width of a threadgroup.

source
Metal.grid_size_1dFunction
grid_size_1d()::UInt32
grid_size_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
grid_size_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return maximum size of the grid for threads that read per-thread stage-in data.

source
Metal.grid_origin_1dFunction
grid_origin_1d()::UInt32
grid_origin_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
grid_origin_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}

Return the origin offset of the grid for threads that read per-thread stage-in data.

source

Device arrays

Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray, and implements (part of) the array interface as well as other functionality for use on the GPU:

Metal.MtlDeviceArrayType
MtlDeviceArray(dims, ptr)
MtlDeviceArray{T}(dims, ptr)
MtlDeviceArray{T,A}(dims, ptr)
MtlDeviceArray{T,A,N}(dims, ptr)

Construct an N-dimensional dense Metal device array with element type T wrapping a pointer, where N is determined from the length of dims and T is determined from the type of ptr.

dims may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N is supplied explicitly as in Array{T,N}(dims), then it must match the length of dims. The same applies to the element type T, which should match the type of the pointer ptr.

source
Metal.ConstType
Const(A::MtlDeviceArray)

Mark a MtlDeviceArray as constant/read-only and to use the constant address space.

Warning

Experimental API. Subject to change without deprecation.

source

Shared memory

Metal.MtlThreadGroupArrayFunction
MtlThreadGroupArray(::Type{T}, dims)

Create an array local to each threadgroup launched during kernel execution.

source

Synchronization

Metal.MemoryFlagsType
MemoryFlags

Flags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.

Possible values:

None: Set barriers to only act as an execution barrier and not apply a memory fence.

Device: Ensure the GPU correctly orders the memory operations to device memory
        for threads in the threadgroup or simdgroup.

ThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup
        memory for threads in a threadgroup or simdgroup.

Texture: Ensure the GPU correctly orders the memory operations to texture memory for
        threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.

ThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory
        for threads in a threadgroup or simdgroup.
source
Metal.threadgroup_barrierFunction
threadgroup_barrier(flag::MemoryFlags=MemoryFlagNone)

Synchronize all threads in a threadgroup.

Possible flags that affect the memory synchronization behavior are found in MemoryFlags

source
Metal.simdgroup_barrierFunction
simdgroup_barrier(flag::MemoryFlags=MemoryFlagNone)

Synchronize all threads in a SIMD-group.

Possible flags that affect the memory synchronization behavior are found in MemoryFlags

source