Kernel programming
This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.
This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.
Indexing and dimensions
Metal.thread_index_in_quadgroup — Function
thread_index_in_quadgroup()::UInt32Return the index of the current thread in its quadgroup.
Metal.thread_index_in_simdgroup — Function
thread_index_in_simdgroup()::UInt32Return the index of the current thread in its simdgroup.
Metal.thread_index_in_threadgroup — Function
thread_index_in_threadgroup()::UInt32Return the index of the current thread in its threadgroup.
Metal.thread_position_in_grid — Function
thread_position_in_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the current thread's position in an N-dimensional grid of threads.
Metal.thread_position_in_threadgroup — Function
thread_position_in_threadgroup()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the current thread's unique position within a threadgroup.
Metal.threadgroup_position_in_grid — Function
threadgroup_position_in_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the current threadgroup's unique position within the grid.
Metal.threadgroups_per_grid — Function
threadgroups_per_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the number of threadgroups per grid.
Metal.threads_per_grid — Function
threads_per_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the grid size.
Metal.threads_per_simdgroup — Function
threads_per_simdgroup()::UInt32Return the thread execution width of a simdgroup.
Metal.threads_per_threadgroup — Function
threads_per_threadgroup()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the thread execution width of a threadgroup.
Metal.simdgroups_per_threadgroup — Function
simdgroups_per_threadgroup()::UInt32Return the simdgroup execution width of a threadgroup.
Metal.simdgroup_index_in_threadgroup — Function
simdgroup_index_in_threadgroup()::UInt32Return the index of a simdgroup within a threadgroup.
Metal.quadgroup_index_in_threadgroup — Function
quadgroup_index_in_threadgroup()::UInt32Return the index of a quadgroup within a threadgroup.
Metal.quadgroups_per_threadgroup — Function
quadgroups_per_threadgroup()::UInt32Return the quadgroup execution width of a threadgroup.
Metal.grid_size — Function
grid_size()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return maximum size of the grid for threads that read per-thread stage-in data.
Metal.grid_origin — Function
grid_origin()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the origin offset of the grid for threads that read per-thread stage-in data.
Metal.thread_execution_width — Function
thread_execution_width()::UInt32Return the execution width of the compute unit.
This function has been deprecated as of Metal 3.
Use threads_per_simdgroup instead.
Device arrays
Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray, and implements (part of) the array interface as well as other functionality for use on the GPU:
Metal.MtlDeviceArray — Type
MtlDeviceArray(dims, ptr)
MtlDeviceArray{T}(dims, ptr)
MtlDeviceArray{T,A}(dims, ptr)
MtlDeviceArray{T,A,N}(dims, ptr)Construct an N-dimensional dense Metal device array with element type T wrapping a pointer, where N is determined from the length of dims and T is determined from the type of ptr.
dims may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N is supplied explicitly as in Array{T,N}(dims), then it must match the length of dims. The same applies to the element type T, which should match the type of the pointer ptr.
Metal.Const — Type
Const(A::MtlDeviceArray)Mark a MtlDeviceArray as constant/read-only and to use the constant address space.
Shared memory
Metal.MtlThreadGroupArray — Function
MtlThreadGroupArray(::Type{T}, dims)Create an array local to each threadgroup launched during kernel execution.
Synchronization
Metal.MemoryFlags — Type
MemoryFlagsFlags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.
Possible values:
None: Set barriers to only act as an execution barrier and not apply a memory fence.
Device: Ensure the GPU correctly orders the memory operations to device memory
for threads in the threadgroup or simdgroup.
ThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup
memory for threads in a threadgroup or simdgroup.
Texture: Ensure the GPU correctly orders the memory operations to texture memory for
threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.
ThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory
for threads in a threadgroup or simdgroup.Metal.threadgroup_barrier — Function
threadgroup_barrier(flag=MemoryFlagNone)Synchronize all threads in a threadgroup.
Possible flags that affect the memory synchronization behavior are found in MemoryFlags
Metal.simdgroup_barrier — Function
simdgroup_barrier(flag=MemoryFlagNone)Synchronize all threads in a SIMD-group.
Possible flags that affect the memory synchronization behavior are found in MemoryFlags