Kernel programming
This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.
This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.
Indexing and dimensions
Metal.thread_execution_width
— Functionthread_execution_width()::UInt32
Return the execution width of the compute unit.
Metal.thread_index_in_quadgroup
— Functionthread_index_in_quadgroup()::UInt32
Return the index of the current thread in its quadgroup.
Metal.thread_index_in_simdgroup
— Functionthread_index_in_simdgroup()::UInt32
Return the index of the current thread in its simdgroup.
Metal.thread_index_in_threadgroup
— Functionthread_index_in_threadgroup()::UInt32
Return the index of the current thread in its threadgroup.
Metal.thread_position_in_grid_1d
— Functionthread_position_in_grid_1d()::UInt32
thread_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
thread_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the current thread's position in an N-dimensional grid of threads.
Metal.thread_position_in_threadgroup_1d
— Functionthread_position_in_threadgroup_1d()::UInt32
thread_position_in_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
thread_position_in_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the current thread's unique position within a threadgroup.
Metal.threadgroup_position_in_grid_1d
— Functionthreadgroup_position_in_grid_1d()::UInt32
threadgroup_position_in_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threadgroup_position_in_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the current threadgroup's unique position within the grid.
Metal.threadgroups_per_grid_1d
— Functionthreadgroups_per_grid_1d()::UInt32
threadgroups_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threadgroups_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the number of threadgroups per grid.
Metal.threads_per_grid_1d
— Functionthreads_per_grid_1d()::UInt32
threads_per_grid_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threads_per_grid_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the grid size.
Metal.threads_per_simdgroup
— Functionthreads_per_simdgroup()::UInt32
Return the thread execution width of a simdgroup.
Metal.threads_per_threadgroup_1d
— Functionthreads_per_threadgroup_1d()::UInt32
threads_per_threadgroup_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
threads_per_threadgroup_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the thread execution width of a threadgroup.
Metal.simdgroups_per_threadgroup
— Functionsimdgroups_per_threadgroup()::UInt32
Return the simdgroup execution width of a threadgroup.
Metal.simdgroup_index_in_threadgroup
— Functionsimdgroup_index_in_threadgroup()::UInt32
Return the index of a simdgroup within a threadgroup.
Metal.quadgroup_index_in_threadgroup
— Functionquadgroup_index_in_threadgroup()::UInt32
Return the index of a quadgroup within a threadgroup.
Metal.quadgroups_per_threadgroup
— Functionquadgroups_per_threadgroup()::UInt32
Return the quadgroup execution width of a threadgroup.
Metal.grid_size_1d
— Functiongrid_size_1d()::UInt32
grid_size_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
grid_size_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return maximum size of the grid for threads that read per-thread stage-in data.
Metal.grid_origin_1d
— Functiongrid_origin_1d()::UInt32
grid_origin_2d()::NamedTuple{(:x, :y), Tuple{UInt32, UInt32}}
grid_origin_3d()::NamedTuple{(:x, :y, :z), Tuple{UInt32, UInt32, UInt32}}
Return the origin offset of the grid for threads that read per-thread stage-in data.
Device arrays
Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray
, and implements (part of) the array interface as well as other functionality for use on the GPU:
Metal.MtlDeviceArray
— TypeMtlDeviceArray(dims, ptr)
MtlDeviceArray{T}(dims, ptr)
MtlDeviceArray{T,A}(dims, ptr)
MtlDeviceArray{T,A,N}(dims, ptr)
Construct an N
-dimensional dense Metal device array with element type T
wrapping a pointer, where N
is determined from the length of dims
and T
is determined from the type of ptr
.
dims
may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N
is supplied explicitly as in Array{T,N}(dims)
, then it must match the length of dims
. The same applies to the element type T
, which should match the type of the pointer ptr
.
Metal.Const
— TypeConst(A::MtlDeviceArray)
Mark a MtlDeviceArray as constant/read-only and to use the constant address space.
Experimental API. Subject to change without deprecation.
Shared memory
Metal.MtlThreadGroupArray
— FunctionMtlThreadGroupArray(::Type{T}, dims)
Create an array local to each threadgroup launched during kernel execution.
Synchronization
Metal.MemoryFlags
— TypeMemoryFlags
Flags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.
Possible values:
None: Set barriers to only act as an execution barrier and not apply a memory fence.
Device: Ensure the GPU correctly orders the memory operations to device memory
for threads in the threadgroup or simdgroup.
ThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup
memory for threads in a threadgroup or simdgroup.
Texture: Ensure the GPU correctly orders the memory operations to texture memory for
threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.
ThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory
for threads in a threadgroup or simdgroup.
Metal.threadgroup_barrier
— Functionthreadgroup_barrier(flag::MemoryFlags=MemoryFlagNone)
Synchronize all threads in a threadgroup.
Possible flags that affect the memory synchronization behavior are found in MemoryFlags
Metal.simdgroup_barrier
— Functionsimdgroup_barrier(flag::MemoryFlags=MemoryFlagNone)
Synchronize all threads in a SIMD-group.
Possible flags that affect the memory synchronization behavior are found in MemoryFlags