Compiler
Execution
The main entry-point to the compiler is the @metal
macro:
Metal.@metal
— Macro@metal threads=... groups=... [kwargs...] func(args...)
High-level interface for executing code on a GPU.
The @metal
macro should prefix a call, with func
a callable function or object that should return nothing. It will be compiled to a Metal function upon first use, and to a certain extent arguments will be converted and managed automatically using mtlconvert
. Finally, a call to mtlcall
is performed, creating a command buffer in the current global command queue then committing it.
There is one supported keyword argument that influences the behavior of @metal
:
launch
: whether to launch this kernel, defaults totrue
. Iffalse
the returned kernel object should be launched by calling it and passing arguments again.name
: the name of the kernel in the generated code. Defaults to an automatically- generated name.queue
: the command queue to use for this kernel. Defaults to the global command queue.
If needed, you can use a lower-level API that lets you inspect the compiler kernel:
Metal.mtlconvert
— Functionmtlconvert(x, [cce])
This function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object x
as-is.
Do not add methods to this function, but instead extend the underlying Adapt.jl package and register methods for the the Metal.Adaptor
type.
Metal.mtlfunction
— Functionmtlfunction(f, tt=Tuple{}; kwargs...)
Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @metal
.
The following keyword arguments are supported:
macos
,metal
andair
: to override the macOS OS, Metal language and AIR bitcode versions used during compilation. Value should be a valid version number.
The output of this function is automatically cached, i.e. you can simply call mtlfunction
in a hot path without degrading performance. New code will be generated automatically when the function changes, or when different types or keyword arguments are provided.
Reflection
If you want to inspect generated code, you can use macros that resemble functionality from the InteractiveUtils standard library:
@device_code_lowered
@device_code_typed
@device_code_warntype
@device_code_llvm
@device_code_native
@device_code_agx
@device_code
For more information, please consult the GPUCompiler.jl documentation. code_agx
is actually code_native
: