Kernel programming
This section lists the package's public functionality that corresponds to special Metal functions for use in device code. For more information about these functions, please consult the Metal Shading Language specification.
This is made possible by interfacing with the Metal libraries by wrapping a subset of the ObjectiveC APIs using ObjectiveC.jl. These low-level wrappers are available in the MTL submodule exported by Metal.jl.
Indexing and dimensions
Metal.thread_index_in_quadgroup — Functionthread_index_in_quadgroup()::UInt32Return the index of the current thread in its quadgroup.
Metal.thread_index_in_simdgroup — Functionthread_index_in_simdgroup()::UInt32Return the index of the current thread in its simdgroup.
Metal.thread_index_in_threadgroup — Functionthread_index_in_threadgroup()::UInt32Return the index of the current thread in its threadgroup.
Metal.thread_position_in_grid — Functionthread_position_in_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the current thread's position in an N-dimensional grid of threads.
Metal.thread_position_in_threadgroup — Functionthread_position_in_threadgroup()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the current thread's unique position within a threadgroup.
Metal.threadgroup_position_in_grid — Functionthreadgroup_position_in_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the current threadgroup's unique position within the grid.
Metal.threadgroups_per_grid — Functionthreadgroups_per_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the number of threadgroups per grid.
Metal.threads_per_grid — Functionthreads_per_grid()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the grid size.
Metal.threads_per_simdgroup — Functionthreads_per_simdgroup()::UInt32Return the thread execution width of a simdgroup.
Metal.threads_per_threadgroup — Functionthreads_per_threadgroup()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the thread execution width of a threadgroup.
Metal.simdgroups_per_threadgroup — Functionsimdgroups_per_threadgroup()::UInt32Return the simdgroup execution width of a threadgroup.
Metal.simdgroup_index_in_threadgroup — Functionsimdgroup_index_in_threadgroup()::UInt32Return the index of a simdgroup within a threadgroup.
Metal.quadgroup_index_in_threadgroup — Functionquadgroup_index_in_threadgroup()::UInt32Return the index of a quadgroup within a threadgroup.
Metal.quadgroups_per_threadgroup — Functionquadgroups_per_threadgroup()::UInt32Return the quadgroup execution width of a threadgroup.
Metal.grid_size — Functiongrid_size()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return maximum size of the grid for threads that read per-thread stage-in data.
Metal.grid_origin — Functiongrid_origin()::@NamedTuple{x::UInt32, y::UInt32, z::UInt32}Return the origin offset of the grid for threads that read per-thread stage-in data.
Metal.thread_execution_width — Functionthread_execution_width()::UInt32Return the execution width of the compute unit.
This function has been deprecated as of Metal 3.
Use threads_per_simdgroup instead.
Device arrays
Metal.jl provides a primitive, lightweight array type to manage GPU data organized in an plain, dense fashion. This is the device-counterpart to the MtlArray, and implements (part of) the array interface as well as other functionality for use on the GPU:
Metal.MtlDeviceArray — TypeMtlDeviceArray(dims, ptr)
MtlDeviceArray{T}(dims, ptr)
MtlDeviceArray{T,A}(dims, ptr)
MtlDeviceArray{T,A,N}(dims, ptr)Construct an N-dimensional dense Metal device array with element type T wrapping a pointer, where N is determined from the length of dims and T is determined from the type of ptr.
dims may be a single scalar, or a tuple of integers corresponding to the lengths in each dimension). If the rank N is supplied explicitly as in Array{T,N}(dims), then it must match the length of dims. The same applies to the element type T, which should match the type of the pointer ptr.
Metal.Const — TypeConst(A::MtlDeviceArray)Mark a MtlDeviceArray as constant/read-only and to use the constant address space.
Experimental API. Subject to change without deprecation.
Shared memory
Metal.MtlThreadGroupArray — FunctionMtlThreadGroupArray(::Type{T}, dims)Create an array local to each threadgroup launched during kernel execution.
Synchronization
Metal.MemoryFlags — TypeMemoryFlagsFlags to set the memory synchronization behavior of threadgroup_barrier and simdgroup_barrier.
Possible values:
None: Set barriers to only act as an execution barrier and not apply a memory fence.
Device: Ensure the GPU correctly orders the memory operations to device memory
for threads in the threadgroup or simdgroup.
ThreadGroup: Ensure the GPU correctly orders the memory operations to threadgroup
memory for threads in a threadgroup or simdgroup.
Texture: Ensure the GPU correctly orders the memory operations to texture memory for
threads in a threadgroup or simdgroup for a texture with the read_write access qualifier.
ThreadGroup_ImgBlock: Ensure the GPU correctly orders the memory operations to threadgroup imageblock memory
for threads in a threadgroup or simdgroup.Metal.threadgroup_barrier — Functionthreadgroup_barrier(flag=MemoryFlagNone)Synchronize all threads in a threadgroup.
Possible flags that affect the memory synchronization behavior are found in MemoryFlags
Metal.simdgroup_barrier — Functionsimdgroup_barrier(flag=MemoryFlagNone)Synchronize all threads in a SIMD-group.
Possible flags that affect the memory synchronization behavior are found in MemoryFlags