Compiler

Execution

The main entry-point to the compiler is the @metal macro:

Metal.@metalMacro
@metal threads=... groups=... [kwargs...] func(args...)

High-level interface for executing code on a GPU.

The @metal macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a Metal function upon first use, and to a certain extent arguments will be converted and managed automatically using mtlconvert. Finally, a call to mtlcall is performed, creating a command buffer in the current global command queue then committing it.

There is one supported keyword argument that influences the behavior of @metal:

  • launch: whether to launch this kernel, defaults to true. If false the returned kernel object should be launched by calling it and passing arguments again.
  • name: the name of the kernel in the generated code. Defaults to an automatically- generated name.
  • queue: the command queue to use for this kernel. Defaults to the global command queue.
source

If needed, you can use a lower-level API that lets you inspect the compiler kernel:

Metal.mtlconvertFunction

mtlconvert(x, [cce])

This function is called for every argument to be passed to a kernel, allowing it to be converted to a GPU-friendly format. By default, the function does nothing and returns the input object x as-is.

Do not add methods to this function, but instead extend the underlying Adapt.jl package and register methods for the the Metal.Adaptor type.

source
Metal.mtlfunctionFunction
mtlfunction(f, tt=Tuple{}; kwargs...)

Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @metal.

The output of this function is automatically cached, i.e. you can simply call mtlfunction in a hot path without degrading performance. New code will be generated automatically when the function changes, or when different types or keyword arguments are provided.

source

Reflection

If you want to inspect generated code, you can use macros that resemble functionality from the InteractiveUtils standard library:

@device_code_lowered
@device_code_typed
@device_code_warntype
@device_code_llvm
@device_code_native
@device_code_agx
@device_code

For more information, please consult the GPUCompiler.jl documentation. code_agx is actually code_native: