Bitwise ternary logic instruction
Bitwise ternary logic instructions can logically implement all possible bitwise operations between three inputs (256 permutations). They take three registers as input and an 8-bit immediate field. Each bit in the output is generated using an 8-bit Lookup table of the three corresponding bits in the inputs to select one of the 8 positions in the 8-bit immediate. Since only 8 combinations are possible using three bits, this allow all possible 3-input bitwise operations to be performed. In mathematical terminology: each corresponding bit of the three inputs is a ternary Boolean function with a Hasse diagram of order n=8.[1] Also known as minterms.
A full table showing all 256 possible 3-operand logical bitwise instruction may be found in the Power ISA description of xxeval .[2] An additional insight is that if the 8-bit immediate were an operand (register) then in FPGA terminology, bitwise ternary logical instructions would implement an array of Hardware LUT3s.
One of the typical applications is an implementation bit manipulation for the symmetric cyphers.[3] [4][5][6]
Description
In pseudocode the output from three single-bit inputs is illustrated by using r2, r1 and r0 as three binary digits of a 3-bit index, to treat the 8-bit immediate as a lookup table and to simply return the indexed bit:
result := imm8(r2<<2 + r1<<1 + r0)
A readable implementation in Python of three single-bit inputs (r0 r1 and r2) is shown below:
def ternlut8(r0, r1, r2, imm8):
"""Implementation of a LUT3 (ternary lookup)"""
# index will be in range 0 to 7
lut_index = 0
# r0 sets bit0, r1 bit1, and r2 bit2
if r0: lut_index |= 1 << 0
if r1: lut_index |= 1 << 1
if r2: lut_index |= 1 << 2
# return the requested indexed bit of imm8
return imm8 & (1 << lut_index) != 0
If the input registers are 64-bit then the output is correspondingly 64-bit, and would be constructed from selecting each indexed bit of the three inputs to create the corresponding indexed bit of the output:
def ternlut8_64bit(R0, R1, R2, imm8):
"""Implementation of a 64-bit ternary lookup instruction"""
result = 0
for i in range(64):
m = 1 << i # single mask bit of inputs
r0, r1, r2 = (R0 & m), (R1 & m), (R2 & m)
result |= ternlut8(r0, r1, r2, imm8) << i
return result
An example table of just three possible permutations out of the total 256 for the 8-bit immediate is shown below - Double-AND, Double-OR and Bitwise-blend. The immediate (the 8-bit lookup table) is named imm8, below. Note that the column has the value in binary of its corresponding header: imm8:0xCA is binary 11001010 in the "Bitwise blend" column:
| A0 | A1 | A2 | Double AND (imm8=0x80) |
Double OR (imm8=0xFE) |
Bitwise blend (imm8=0xCA) |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 | 1 |
| 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 |
Uses
The number of uses is significant: anywhere that three logical bitwise operations are used in algorithms. Carry-save, SHA-1 SHA-2, MD5, and exactly-one and exactly-two bitcounting used in Harley-Seal Popcount.[7] vpternlog speeds up MD5 by 20%[8]
Implementations
Although unusual due to the high cost in hardware this instruction is found in a number of instruction set architectures
- The 1985 Amiga blitter capability in Agnus: the 8-bit immediate was termed "minterm", and the operation was memory-to-memory.[9]
- The AVX-512 extension calls it
vpternlog[10] - Power ISA v3.1 calls the instruction
xxeval.[11] - Intel Larrabee also implemented this instruction as
vpternlog: Tom Forsyth explains, amusingly, the Intel test engineers being happy to have one instruction to test rather than 256.[12][13][14][15]
See also
- Bit manipulation instruction set – Type of computer instructions
- Bitwise operation – Computer science topic
- Boolean function – Function returning one of only two values
References
- ^ "Hasse Diagram".
- ^ power3.1, pp. Power ISA Book I Chapter 7. Vector-Scalar Extension Facility p968, Table 145.
- ^ Mercadier & Dagand 2019.
- ^ Sovyn, Khoma & Podpora 2023.
- ^ Xu 2017.
- ^ Biham 1997.
- ^ "AVX512: Ternary functions evaluation".
- ^ "Animetosho/Md5-optimisation". GitHub.
- ^ "AVX Bitwise ternary logic instruction busted!". 6 October 2024.
- ^ "Intel Architecture Instruction Set Extensions Programming Reference" (PDF). Intel. Retrieved 2014-01-29.
- ^ power3.1, pp. Power ISA Book I Chapter 7. Vector-Scalar Extension Facility p967.
- ^ "TomF's talks and papers".
- ^ https://tomforsyth1000.github.io/papers/LRBNI%20origins%20v4%20full%20fat.pdf
- ^ "Tom Forsyth - the Lifecycle of an Instruction Set on Vimeo".
- ^ "Tom Forsyth - the Lifecycle of an Instruction Set". 22 August 2020.
Sources
- Biham, Eli (1997). "A Fast New DES Implementation in Software" (PDF). Fast Software Encryption. Lecture Notes in Computer Science. Vol. 1267. Berlin, Heidelberg: Springer. pp. 260–272. doi:10.1007/BFb0052352.
- Mercadier, Darius; Dagand, Pierre-Évariste (June 2019). "Usuba: high-throughput and constant-time ciphers, by construction". Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Phoenix, AZ, USA: Association for Computing Machinery. pp. 157–173. doi:10.1145/3314221.3314636. ISBN 978-1-4503-6712-7.
- Power ISA™ Version 3.1 (PDF) (v3.1 ed.). IBM. May 1, 2020. SA22-7832-14. Retrieved Aug 7, 2025.
- Sovyn, Yaroslav; Khoma, Volodymyr; Podpora, Michal (2023). "Bitsliced Implementation of Non-Algebraic 8 × 8 Cryptographic S-Boxes Using x86-64 Processor SIMD Instructions". IEEE Transactions on Information Forensics and Security. 18: 491–500. doi:10.1109/TIFS.2022.3223782.
- Xu, Shixiong (September 2017). Data Layout Oriented Compilation Techniques in Vectorization for Multi-/Many-cores (Dissertation). Trinity College Dublin.
External links
- AVX-512 ternary functions
- Libre-SOC instructions - includes boolean variants as well as ternary
- ternary function implemented in Python