top of page
Physics Tomorrow Letters icon.jpg
ORCID_iD.svg.png

Locked

applsciletetrsa>vol-01>issue-05>Physics-inspired Ising Computing with Ring Oscillator Activated p-bits

ORCID_iD.svg.png
Copy of SCOPUS INDEXED (1).png
GIF featured item.gif
Editorial Choice.PNG

Letter

Physics-inspired Ising Computing with Ring Oscillator Activated p-bits

Navid Anjum Aadit
Andrea Grimaldiy

Applied Science Letters

2022 ° 09(06) ° 01-04

https://www.wikipt.org/applscilettersa

DOI: 10.1490/665877402.570applsci

Abstract

The nearing end of Moore’s Law has been driving the development of domain-specific hardware tailored to solve a special set of problems. Along these lines, probabilistic computing with inherently stochastic building blocks (p-bits) have shown significant promise, particularly in the context of hard optimization and statistical sampling problems. p-bits have been proposed and demonstrated in different hardware substrates ranging from small-scale stochastic magnetic tunnel junctions (sMTJs) in asynchronous architectures to large-scale CMOS in synchronous architectures. Here, we design and implement a truly asynchronous and medium-scale p-computer (with  800 pbits) that closely emulates the asynchronous dynamics of sMTJs in Field Programmable Gate Arrays (FPGAs). Using hard instances of the planted Ising glass problem on the Chimera lattice, we evaluate the performance of the asynchronous architecture against an ideal, synchronous design that performs parallelized (chromatic) exact Gibbs sampling. We find that despite the lack of any careful synchronization, the asynchronous design achieves parallelism with comparable algorithmic scaling in the ideal, carefully tuned and parallelized synchronous design. Our results highlight the promise of massively scaled p-computers with millions of free-running p-bits made out of nanoscale building blocks such as stochastic magnetic tunnel junctions.

Introduction


With the nearing end of Moore’s Law, domain-specific hardware and architectures are growing rapidly. The notion of performing some tasks more efficiently (area, speed and/or energy) rather than improving performance for general purpose computing has led to the proliferation of special-purpose accelerators. With their widespread use, hard optimization problems have been a primary target of this approach and a variety of different domain-specific hardware architectures have emerged (see, Ref. [1] for a general and recent review). As an example of this growing trend, probabilistic bits or p-bits were introduced [2] as a building block which can accelerate a broad family of algorithms including Monte Carlo, Markov Chain Monte Carlo [3], Quantum Monte Carlo, statistical sampling for Bayesian inference and Boltzmann machine learning [4] methods. p-bits have been shown to be compatible with powerful optimization techniques such asparallel tempering [5] with competitive performance relative to all other Ising machines (classical and quantum) in select problems such as integer factorization and Boolean satisfiability [6]. Their combination with sophisticated algorithms [7] could yield further advantages. A natural advantage of the p-bit model is its native mapping to the Ising model and to the natural generalization of Ising Models. This ensures that coupled p-bits can systematically probe the exact Boltzmann distribution through Gibbs or Metropolis sampling without any approximations or reductions.

Unlock Only

Read-only this publication

This option will drive you towards only the selected publication. If you want to save money then choose the full access plan from the right side.

Unlock all

Get access to entire database

This option will unlock the entire database of us to you without any limitations for a specific time period.
This offer is limited to 100000 clients if you make delay further, the offer slots will be booked soon. Afterwards, the prices will be 50% hiked.

Conclusion


We present the TTS for solving 100 instances of the planted Ising problems of 9 different sizes in FIG. 3. The standard Gibbs (CPU) was implemented in Python using optimized libraries for matrix calculations. The final two points were not computed because of time limitations. We solved the exact same instances on the FPGA programmed with the asynchronous ROSC activated 800 p-bits with a fixed point representation using 10-bits. As a reference, the synchronous solver performing chromatic Gibbs sampling solves the same instances on the same FPGA where careful phase shifting ensures no simultaneous or incorrect updates (where Ii calculation is not complete) occur between neighboring p-bits (as in Ref. [6]). On the other hand, the asynchronous solver is expected to take samples with both of those errors when p-bit clock edges happen to be closely separated. Our experiment investigates the usefulness of such samples. We defined a common linear simulated annealing schedule for all the architectures with = 0:5 to 7:0 with a step of 0:5 where, at each , a total of 937 sweeps (attempted flips of all p-bits) are executed. In the FPGA the annealing time was fixed to  = 1:4 ms for each trial. To obtain a configuration comparable to the asynchronous architecture, the synchronous architecture was set up with two stable and oppositely phase shifted clocks with the average frequency (9.375 MHz) of the 10 ROSC clocks. We believe this arrangement made the two designs equivalent beyond the asynchronous and inexact dynamics of the ROSC since both designs approximately take the same amount of samples within the fixed annealing time  . The key result we obtained is shown in FIG. 3. We observe a clear scaling difference between the CPU implementation of standard serialized Gibbs sampling and the massively parallel FPGA implementations which (ideally) obtain a scaling factor of  N in their flips/second due to their massively parallel architecture. Both solvers provide a roughly 5-orders of magnitude prefactor improvement over the CPU. Intriguingly, the scaling of the synchronous and asynchronous FPGA remain similar, despite the possibility of many collisions (parallel or incorrect updates) in the asynchronous design. Indeed, the carefully tuned synchronous design performs strictly better than the asynchronous one in all instances. Nevertheless, it is encouraging to observe that the asynchronous design without any carefully engineered clocks or tuning performs nearly as well, leading to the promising possibility of truly asynchronous, million-bit p-computers with stochastic MTJs or other nanodevices.

bottom of page