Kernel programming

This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.

This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.

Indexing and dimensions

Metal.thread_index_in_quadgroup — Function

thread_index_in_quadgroup()::UInt32

Return the index of the current thread in its quadgroup.

Metal.thread_index_in_simdgroup — Function

thread_index_in_simdgroup()::UInt32

Return the index of the current thread in its simdgroup.

Metal.thread_index_in_threadgroup — Function

thread_index_in_threadgroup()::UInt32

Return the index of the current thread in its threadgroup.

Metal.thread_position_in_grid — Function

thread_position_in_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the current thread's position in an N-dimensional grid of threads.

Metal.thread_position_in_threadgroup — Function

thread_position_in_threadgroup()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the current thread's unique position within a threadgroup.

Metal.threadgroup_position_in_grid — Function

threadgroup_position_in_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the current threadgroup's unique position within the grid.

Metal.threadgroups_per_grid — Function

threadgroups_per_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the number of threadgroups per grid.

Metal.threads_per_grid — Function

threads_per_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the grid size.

Metal.threads_per_simdgroup — Function

threads_per_simdgroup()::UInt32

Return the thread execution width of a simdgroup.

Metal.threads_per_threadgroup — Function

threads_per_threadgroup()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the thread execution width of a threadgroup.

Metal.simdgroups_per_threadgroup — Function

simdgroups_per_threadgroup()::UInt32

Return the simdgroup execution width of a threadgroup.

Metal.simdgroup_index_in_threadgroup — Function

simdgroup_index_in_threadgroup()::UInt32

Return the index of a simdgroup within a threadgroup.

Metal.quadgroup_index_in_threadgroup — Function

quadgroup_index_in_threadgroup()::UInt32

Return the index of a quadgroup within a threadgroup.

Metal.quadgroups_per_threadgroup — Function

quadgroups_per_threadgroup()::UInt32

Return the quadgroup execution width of a threadgroup.

Metal.grid_size — Function

grid_size()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return maximum size of the grid for threads that read per-thread stage-in data.

Metal.grid_origin — Function

grid_origin()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}

Return the origin offset of the grid for threads that read per-thread stage-in data.

Metal.thread_execution_width — Function

thread_execution_width()::UInt32

Return the execution width of the compute unit.

This function has been deprecated as of Metal 3.

Use threads_per_simdgroup instead.

Device arrays

Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray, and implements (part of) the array interface as well as other functionality for use on the GPU:

Metal.MtlDeviceArray — Type

MtlDeviceArray(dims, ptr)
MtlDeviceArray{T}(dims, ptr)
MtlDeviceArray{T,A}(dims, ptr)
MtlDeviceArray{T,A,N}(dims, ptr)

Construct an N-dimensional dense Metal device array with element type T wrapping a pointer, where N is determined from the length of dims and T is determined from the type of ptr.

dims may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N is supplied explicitly as in Array{T,N}(dims), then it must match the length of dims. The same applies to the element type T, which should match the type of the pointer ptr.

Metal.Const — Type

Const(A::MtlDeviceArray)

Mark a MtlDeviceArray as constant/read-only and to use the constant address space.

Warning

Experimental API. Subject to change without deprecation.

Shared memory

Metal.MtlThreadGroupArray — Function

MtlThreadGroupArray(::Type{T}, dims)

Create an array local to each threadgroup launched during kernel execution.

Synchronization

Metal.MemoryFlags — Type

MemoryFlags

Flags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.

Possible values:

None: Set barriers to only act as an execution barrier and not apply a memory fence.

Device: Ensure the GPU correctly orders the memory operations to device memory
        for threads in the threadgroup or simdgroup.

ThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup
        memory for threads in a threadgroup or simdgroup.

Texture: Ensure the GPU correctly orders the memory operations to texture memory for
        threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.

ThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory
        for threads in a threadgroup or simdgroup.

Metal.threadgroup_barrier — Function

threadgroup_barrier(flag=MemoryFlagNone)

Synchronize all threads in a threadgroup.

Possible flags that affect the memory synchronization behavior are found in MemoryFlags

Metal.simdgroup_barrier — Function

simdgroup_barrier(flag=MemoryFlagNone)

Synchronize all threads in a SIMD-group.

Possible flags that affect the memory synchronization behavior are found in MemoryFlags