Crate safe_arch[−][src]

Expand description

A crate that safely exposes arch intrinsics via #[cfg()].

safe_arch lets you safely use CPU intrinsics. Those things in the core::arch modules. It works purely via #[cfg()] and compile time CPU feature declaration. If you want to check for a feature at runtime and then call an intrinsic or use a fallback path based on that then this crate is sadly not for you.

SIMD register types are “newtype’d” so that better trait impls can be given to them, but the inner value is a pub field so feel free to just grab it out if you need to. Trait impls of the newtypes include: Default (zeroed), From/Into of appropriate data types, and appropriate operator overloading.

Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
Some intrinsics take a pointer of an assumed minimum alignment and validity span. For these, the safe_arch function takes a reference of an appropriate type to uphold safety.
- Try the bytemuck crate (and turn on the bytemuck feature of this crate) if you want help safely casting between reference types.
Some intrinsics are not safe unless you’re very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren’t exposed here.
Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn’t really support you messing with that within a high level language, so those operations aren’t exposed here. Use assembly or something if you want to do that.

Naming Conventions

The safe_arch crate does not simply use the “official” names for each intrinsic, because the official names are generally poor. Instead, the operations have been given better names that makes things hopefully easier to understand then you’re reading the code.

For a full explanation of the naming used, see the Naming Conventions page.

Current Support

x86 / x86_64 (Intel, AMD, etc)
- 128-bit: sse, sse2, sse3, ssse3, sse4.1, sse4.2
- 256-bit: avx, avx2
- Other: adx, aes, bmi1, bmi2, fma, lzcnt, pclmulqdq, popcnt, rdrand, rdseed

Compile Time CPU Target Features

At the time of me writing this, Rust enables the sse and sse2 CPU features by default for all i686 (x86) and x86_64 builds. Those CPU features are built into the design of x86_64, and you’d need a super old x86 CPU for it to not support at least sse and sse2, so they’re a safe bet for the language to enable all the time. In fact, because the standard library is compiled with them enabled, simply trying to disable those features would actually cause ABI issues and fill your program with UB (link).

If you want additional CPU features available at compile time you’ll have to enable them with an additional arg to rustc. For a feature named name you pass -C target-feature=+name, such as -C target-feature=+sse3 for sse3.

You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you’re building a program you’ll only run on your own system.

It’s sometimes hard to know if your target platform will support a given feature set, but the Steam Hardware Survey is generally taken as a guide to what you can expect people to have available. If you click “Other Settings” it’ll expand into a list of CPU target features and how common they are. These days, it seems that sse3 can be safely assumed, and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff above 128-bit isn’t as common yet, give it another few years.

Please note that executing a program on a CPU that doesn’t support the target features it was compiles for is Undefined Behavior.

Currently, Rust doesn’t actually support an easy way for you to check that a feature enabled at compile time is actually available at runtime. There is the “feature_detected” family of macros, but if you enable a feature they will evaluate to a constant true instead of actually deferring the check for the feature to runtime. This means that, if you did want a check at the start of your program, to confirm that all the assumed features are present and error out when the assumptions don’t hold, you can’t use that macro. You gotta use CPUID and check manually. rip. Hopefully we can make that process easier in a future version of this crate.

A Note On Working With Cfg

There’s two main ways to use cfg:

Via an attribute placed on an item, block, or expression:
- #[cfg(debug_assertions)] println!("hello");
Via a macro used within an expression position:
- if cfg!(debug_assertions) { println!("hello"); }

The difference might seem small but it’s actually very important:

The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don’t always exist as long as the things they name do exist whenever that code is configured into the build.
The macro form will include the configured code no matter what, and then the macro resolves to a constant true or false and the compiler uses dead code elimination to cut out the path not taken.

This crate uses cfg via the attribute, so the functions it exposes don’t exist at all when the appropriate CPU target features aren’t enabled. Accordingly, if you plan to call this crate or not depending on what features are enabled in the build you’ll also need to control your use of this crate via cfg attribute, not cfg macro.

m128d

The data for a 128-bit SSE register of two f64 values.

m128i

The data for a 128-bit SSE register of integer data.

m256

The data for a 256-bit AVX register of eight f32 lanes.

m256d

The data for a 256-bit AVX register of four f64 values.

m256i

The data for a 256-bit AVX register of integer data.

Functions

add_i8_m128i

Lanewise a + b with lanes as i8.

add_i16_m128i

Lanewise a + b with lanes as i16.

add_i32_m128i

Lanewise a + b with lanes as i32.

add_i64_m128i

Lanewise a + b with lanes as i64.

add_m128

Lanewise a + b.

add_m128_s

Low lane a + b, other lanes unchanged.

add_m128d

Lanewise a + b.

add_m128d_s

Lowest lane a + b, high lane unchanged.

add_saturating_i8_m128i

Lanewise saturating a + b with lanes as i8.

add_saturating_i16_m128i

Lanewise saturating a + b with lanes as i16.

add_saturating_u8_m128i

Lanewise saturating a + b with lanes as u8.

add_saturating_u16_m128i

Lanewise saturating a + b with lanes as u16.

average_u8_m128i

Lanewise average of the u8 values.

average_u16_m128i

Lanewise average of the u16 values.

Bitwise a & b.

Bitwise a & b.

Bitwise a & b.

Bitwise (!a) & b.

Bitwise (!a) & b.

Bitwise (!a) & b.

Bitwise a | b.

Bitwise a | b.

Bitwise a | b.

Bitwise a ^ b.

Bitwise a ^ b.

Bitwise a ^ b.

byte_shl_imm_u128_m128i

Shifts all bits in the entire register left by a number of bytes.

byte_shr_imm_u128_m128i

Shifts all bits in the entire register right by a number of bytes.

byte_swap_i32

Swap the bytes of the given 32-bit value.

byte_swap_i64

Swap the bytes of the given 64-bit value.

cast_to_m128_from_m128d

Bit-preserving cast to m128 from m128d

cast_to_m128_from_m128i

Bit-preserving cast to m128 from m128i

cast_to_m128d_from_m128

Bit-preserving cast to m128d from m128

cast_to_m128d_from_m128i

Bit-preserving cast to m128d from m128i

cast_to_m128i_from_m128

Bit-preserving cast to m128i from m128

cast_to_m128i_from_m128d

Bit-preserving cast to m128i from m128d

cmp_eq_i32_m128_s

Low lane equality.

cmp_eq_i32_m128d_s

Low lane f64 equal to.

cmp_eq_mask_i8_m128i

Lanewise a == b with lanes as i8.

cmp_eq_mask_i16_m128i

Lanewise a == b with lanes as i16.

cmp_eq_mask_i32_m128i

Lanewise a == b with lanes as i32.

cmp_eq_mask_m128

Lanewise a == b.

cmp_eq_mask_m128_s

Low lane a == b, other lanes unchanged.

cmp_eq_mask_m128d

Lanewise a == b, mask output.

cmp_eq_mask_m128d_s

Low lane a == b, other lanes unchanged.

cmp_ge_i32_m128_s

Low lane greater than or equal to.

cmp_ge_i32_m128d_s

Low lane f64 greater than or equal to.

cmp_ge_mask_m128

Lanewise a >= b.

cmp_ge_mask_m128_s

Low lane a >= b, other lanes unchanged.

cmp_ge_mask_m128d

Lanewise a >= b.

cmp_ge_mask_m128d_s

Low lane a >= b, other lanes unchanged.

cmp_gt_i32_m128_s

Low lane greater than.

cmp_gt_i32_m128d_s

Low lane f64 greater than.

cmp_gt_mask_i8_m128i

Lanewise a > b with lanes as i8.

cmp_gt_mask_i16_m128i

Lanewise a > b with lanes as i16.

cmp_gt_mask_i32_m128i

Lanewise a > b with lanes as i32.

cmp_gt_mask_m128

Lanewise a > b.

cmp_gt_mask_m128_s

Low lane a > b, other lanes unchanged.

cmp_gt_mask_m128d

Lanewise a > b.

cmp_gt_mask_m128d_s

Low lane a > b, other lanes unchanged.

cmp_le_i32_m128_s

Low lane less than or equal to.

cmp_le_i32_m128d_s

Low lane f64 less than or equal to.

cmp_le_mask_m128

Lanewise a <= b.

cmp_le_mask_m128_s

Low lane a <= b, other lanes unchanged.

cmp_le_mask_m128d

Lanewise a <= b.

cmp_le_mask_m128d_s

Low lane a <= b, other lanes unchanged.

cmp_lt_i32_m128_s

Low lane less than.

cmp_lt_i32_m128d_s

Low lane f64 less than.

cmp_lt_mask_i8_m128i

Lanewise a < b with lanes as i8.

cmp_lt_mask_i16_m128i

Lanewise a < b with lanes as i16.

cmp_lt_mask_i32_m128i

Lanewise a < b with lanes as i32.

cmp_lt_mask_m128

Lanewise a < b.

cmp_lt_mask_m128_s

Low lane a < b, other lanes unchanged.

cmp_lt_mask_m128d

Lanewise a < b.

cmp_lt_mask_m128d_s

Low lane a < b, other lane unchanged.

cmp_neq_i32_m128_s

Low lane not equal to.

cmp_neq_i32_m128d_s

Low lane f64 less than.

cmp_neq_mask_m128

Lanewise a != b.

cmp_neq_mask_m128_s

Low lane a != b, other lanes unchanged.

cmp_neq_mask_m128d

Lanewise a != b.

cmp_neq_mask_m128d_s

Low lane a != b, other lane unchanged.

cmp_nge_mask_m128

Lanewise !(a >= b).

cmp_nge_mask_m128_s

Low lane !(a >= b), other lanes unchanged.

cmp_nge_mask_m128d

Lanewise !(a >= b).

cmp_nge_mask_m128d_s

Low lane !(a >= b), other lane unchanged.

cmp_ngt_mask_m128

Lanewise !(a > b).

cmp_ngt_mask_m128_s

Low lane !(a > b), other lanes unchanged.

cmp_ngt_mask_m128d

Lanewise !(a > b).

cmp_ngt_mask_m128d_s

Low lane !(a > b), other lane unchanged.

cmp_nle_mask_m128

Lanewise !(a <= b).

cmp_nle_mask_m128_s

Low lane !(a <= b), other lanes unchanged.

cmp_nle_mask_m128d

Lanewise !(a <= b).

cmp_nle_mask_m128d_s

Low lane !(a <= b), other lane unchanged.

cmp_nlt_mask_m128

Lanewise !(a < b).

cmp_nlt_mask_m128_s

Low lane !(a < b), other lanes unchanged.

cmp_nlt_mask_m128d

Lanewise !(a < b).

cmp_nlt_mask_m128d_s

Low lane !(a < b), other lane unchanged.

cmp_ordered_mask_m128

Lanewise (!a.is_nan()) & (!b.is_nan()).

cmp_ordered_mask_m128_s

Low lane (!a.is_nan()) & (!b.is_nan()), other lanes unchanged.

cmp_ordered_mask_m128d

Lanewise (!a.is_nan()) & (!b.is_nan()).

cmp_ordered_mask_m128d_s

Low lane (!a.is_nan()) & (!b.is_nan()), other lane unchanged.

cmp_unord_mask_m128

Lanewise a.is_nan() | b.is_nan().

cmp_unord_mask_m128_s

Low lane a.is_nan() | b.is_nan(), other lanes unchanged.

cmp_unord_mask_m128d

Lanewise a.is_nan() | b.is_nan().

cmp_unord_mask_m128d_s

Low lane a.is_nan() | b.is_nan(), other lane unchanged.

convert_i32_replace_m128_s

Convert i32 to f32 and replace the low lane of the input.

convert_i32_replace_m128d_s

Convert i32 to f64 and replace the low lane of the input.

convert_i64_replace_m128d_s

Convert i64 to f64 and replace the low lane of the input.

convert_m128_s_replace_m128d_s

Converts the lower f32 to f64 and replace the low lane of the input

convert_m128d_s_replace_m128_s

Converts the low f64 to f32 and replaces the low lane of the input.

convert_to_i32_m128i_from_m128

Rounds the f32 lanes to i32 lanes.

convert_to_i32_m128i_from_m128d

Rounds the two f64 lanes to the low two i32 lanes.

convert_to_m128_from_i32_m128i

Rounds the four i32 lanes to four f32 lanes.

convert_to_m128_from_m128d

Rounds the two f64 lanes to the low two f32 lanes.

convert_to_m128d_from_lower2_i32_m128i

Rounds the lower two i32 lanes to two f64 lanes.

convert_to_m128d_from_lower2_m128

Rounds the two f64 lanes to the low two f32 lanes.

copy_i64_m128i_s

Copy the low i64 lane to a new register, upper bits 0.

copy_replace_low_f64_m128d

Copies the a value and replaces the low lane with the low b value.

div_m128

Lanewise a / b.

div_m128_s

Low lane a / b, other lanes unchanged.

div_m128d

Lanewise a / b.

div_m128d_s

Lowest lane a / b, high lane unchanged.

extract_i16_as_i32_m128i

Gets an i16 value out of an m128i, returns as i32.

get_f32_from_m128_s

Gets the low lane as an individual f32 value.

get_f64_from_m128d_s

Gets the lower lane as an f64 value.

get_i32_from_m128_s

Converts the low lane to i32 and extracts as an individual value.

get_i32_from_m128d_s

Converts the lower lane to an i32 value.

get_i32_from_m128i_s

Converts the lower lane to an i32 value.

get_i64_from_m128d_s

Converts the lower lane to an i64 value.

get_i64_from_m128i_s

Converts the lower lane to an i64 value.

insert_i16_from_i32_m128i

Inserts the low 16 bits of an i32 value into an m128i.

load_f32_m128_s

Loads the f32 reference into the low lane of the register.

load_f32_splat_m128

Loads the f32 reference into all lanes of a register.

load_f64_m128d_s

Loads the reference into the low lane of the register.

load_f64_splat_m128d

Loads the f64 reference into all lanes of a register.

load_i64_m128i_s

Loads the low i64 into a register.

load_m128

Loads the reference into a register.

load_m128d

Loads the reference into a register.

load_m128i

Loads the reference into a register.

load_replace_high_m128d

Loads the reference into a register, replacing the high lane.

load_replace_low_m128d

Loads the reference into a register, replacing the low lane.

load_reverse_m128

Loads the reference into a register with reversed order.

load_reverse_m128d

Loads the reference into a register with reversed order.

load_unaligned_m128

Loads the reference into a register.

load_unaligned_m128d

Loads the reference into a register.

load_unaligned_m128i

Loads the reference into a register.

max_i16_m128i

Lanewise max(a, b) with lanes as i16.

max_m128

Lanewise max(a, b).

max_m128_s

Low lane max(a, b), other lanes unchanged.

max_m128d

Lanewise max(a, b).

max_m128d_s

Low lane max(a, b), other lanes unchanged.

max_u8_m128i

Lanewise max(a, b) with lanes as u8.

min_i16_m128i

Lanewise min(a, b) with lanes as i16.

min_m128

Lanewise min(a, b).

min_m128_s

Low lane min(a, b), other lanes unchanged.

min_m128d

Lanewise min(a, b).

min_m128d_s

Low lane min(a, b), other lanes unchanged.

min_u8_m128i

Lanewise min(a, b) with lanes as u8.

move_high_low_m128

Move the high lanes of b to the low lanes of a, other lanes unchanged.

move_low_high_m128

Move the low lanes of b to the high lanes of a, other lanes unchanged.

move_m128_s

Move the low lane of b to a, other lanes unchanged.

move_mask_i8_m128i

Gathers the i8 sign bit of each lane.

move_mask_m128

Gathers the sign bit of each lane.

move_mask_m128d

Gathers the sign bit of each lane.

mul_i16_horizontal_add_m128i

Multiply i16 lanes producing i32 values, horizontal add pairs of i32 values to produce the final output.

mul_i16_keep_high_m128i

Lanewise a * b with lanes as i16, keep the high bits of the i32 intermediates.

mul_i16_keep_low_m128i

Lanewise a * b with lanes as i16, keep the low bits of the i32 intermediates.

mul_m128

Lanewise a * b.

mul_m128_s

Low lane a * b, other lanes unchanged.

mul_m128d

Lanewise a * b.

mul_m128d_s

Lowest lane a * b, high lane unchanged.

mul_u16_keep_high_m128i

Lanewise a * b with lanes as u16, keep the high bits of the u32 intermediates.

mul_widen_u32_odd_m128i

Multiplies the odd u32 lanes and gives the widened (u64) results.

pack_i16_to_i8_m128i

Saturating convert i16 to i8, and pack the values.

pack_i16_to_u8_m128i

Saturating convert i16 to u8, and pack the values.

pack_i32_to_i16_m128i

Saturating convert i32 to i16, and pack the values.

read_timestamp_counter

Reads the CPU’s timestamp counter value.

read_timestamp_counter_p

Reads the CPU’s timestamp counter value and store the processor signature.

reciprocal_m128

Lanewise 1.0 / a approximation.

reciprocal_m128_s

Low lane 1.0 / a approximation, other lanes unchanged.

reciprocal_sqrt_m128

Lanewise 1.0 / sqrt(a) approximation.

reciprocal_sqrt_m128_s

Low lane 1.0 / sqrt(a) approximation, other lanes unchanged.

set_i8_m128i

Sets the args into an m128i, first arg is the high lane.

set_i16_m128i

Sets the args into an m128i, first arg is the high lane.

set_i32_m128i

Sets the args into an m128i, first arg is the high lane.

set_i32_m128i_s

Set an i32 as the low 32-bit lane of an m128i, other lanes blank.

set_i64_m128i

Sets the args into an m128i, first arg is the high lane.

set_i64_m128i_s

Set an i64 as the low 64-bit lane of an m128i, other lanes blank.

set_m128

Sets the args into an m128, first arg is the high lane.

set_m128_s

Sets the args into an m128, first arg is the high lane.

set_m128d

Sets the args into an m128d, first arg is the high lane.

set_m128d_s

Sets the args into the low lane of a m128d.

set_reversed_i8_m128i

Sets the args into an m128i, first arg is the low lane.

set_reversed_i16_m128i

Sets the args into an m128i, first arg is the low lane.

set_reversed_i32_m128i

Sets the args into an m128i, first arg is the low lane.

set_reversed_m128

Sets the args into an m128, first arg is the low lane.

set_reversed_m128d

Sets the args into an m128d, first arg is the low lane.

set_splat_i8_m128i

Splats the i8 to all lanes of the m128i.

set_splat_i16_m128i

Splats the i16 to all lanes of the m128i.

set_splat_i32_m128i

Splats the i32 to all lanes of the m128i.

set_splat_i64_m128i

Splats the i64 to both lanes of the m128i.

set_splat_m128

Splats the value to all lanes.

set_splat_m128d

Splats the args into both lanes of the m128d.

shl_all_u16_m128i

Shift all u16 lanes to the left by the count in the lower u64 lane.

shl_all_u32_m128i

Shift all u32 lanes to the left by the count in the lower u64 lane.

shl_all_u64_m128i

Shift all u64 lanes to the left by the count in the lower u64 lane.

shl_imm_u16_m128i

Shifts all u16 lanes left by an immediate.

shl_imm_u32_m128i

Shifts all u32 lanes left by an immediate.

shl_imm_u64_m128i

Shifts both u64 lanes left by an immediate.

shr_all_i16_m128i

Shift each i16 lane to the right by the count in the lower i64 lane.

shr_all_i32_m128i

Shift each i32 lane to the right by the count in the lower i64 lane.

shr_all_u16_m128i

Shift each u16 lane to the right by the count in the lower u64 lane.

shr_all_u32_m128i

Shift each u32 lane to the right by the count in the lower u64 lane.

shr_all_u64_m128i

Shift each u64 lane to the right by the count in the lower u64 lane.

shr_imm_i16_m128i

Shifts all i16 lanes right by an immediate.

shr_imm_i32_m128i

Shifts all i32 lanes right by an immediate.

shr_imm_u16_m128i

Shifts all u16 lanes right by an immediate.

shr_imm_u32_m128i

Shifts all u32 lanes right by an immediate.

shr_imm_u64_m128i

Shifts both u64 lanes right by an immediate.

shuffle_abi_f32_all_m128

Shuffle the f32 lanes from $a and $b together using an immediate control value.

shuffle_abi_f64_all_m128d

Shuffle the f64 lanes from $a and $b together using an immediate control value.

shuffle_ai_f32_all_m128i

Shuffle the i32 lanes in $a using an immediate control value.

shuffle_ai_i16_h64all_m128i

Shuffle the high i16 lanes in $a using an immediate control value.

shuffle_ai_i16_l64all_m128i

Shuffle the low i16 lanes in $a using an immediate control value.

sqrt_m128

Lanewise sqrt(a).

sqrt_m128_s

Low lane sqrt(a), other lanes unchanged.

sqrt_m128d

Lanewise sqrt(a).

sqrt_m128d_s

Low lane sqrt(b), upper lane is unchanged from a.

store_high_m128d_s

Stores the high lane value to the reference given.

store_i64_m128i_s