AlphaChip (controversy)

The AlphaChip controversy refers to a series of public and scholarly disputes surrounding a 2021 Nature paper by Google-affiliated researchers. The paper describes an approach to macro placement, a stage of chip floorplanning, based on reinforcement learning (RL), a machine learning method in which a system iteratively improves its decisions by optimizing performance-based reward signals.^[1] The lead researchers of the Nature paper were affiliated with Google Brain and later became part of Google DeepMind.

Following publication, several researchers and commentators raised concerns about the paper’s methodology, reproducibility, and scientific integrity. Additional criticism focused on whether the reported performance gains were meaningful and whether they were in fact caused by the proposed techniques rather than by confounding factors. Nature released the peer-review file, made several corrections, posted an editor’s note announcing an investigation, and published an addendum stating that previously undisclosed $(x,y)$ coordinate data from commercial design software had been used in the study.

Coverage in major media outlets and technical publications reported on criticisms, as well as on responses from the authors and editorial actions taken by Nature. In Communications of the ACM, Goth, Halper, and other commentators linked the dispute to broader concerns about reproducibility, selective reporting, and reliance on proprietary data and large-scale computational resources. The controversy has since included calls for independent replication, discussion within the computer-aided design research community, and legal proceedings related to public statements about the work. Google terminated a researcher who criticized the work, and he then filed a lawsuit against the company in the Santa Clara County Superior Court in California.

Four years after publication, commentators noted the absence of positive replications of the disputed results in peer-reviewed literature and continued to question the scientific integrity of the Nature paper as well as its technical significance in electronic design automation (EDA).

Background

The AlphaChip controversy has multiple dimensions, including legal disputes, questions about the fairness of its comparisons and replicability of its scientific methods and experimental results, as well as technical disputes over whether the proposed approach improves on the state of the art in chip design.

Fair comparisons in computational optimization

In research literature, optimization algorithms are typically compared under equal computational budgets—such as identical limits on running time, memory usage, or numbers of function evaluations—to ensure that observed performance differences reflect algorithmic properties rather than disparities in available resources.^[2]^[3] Standard benchmarking methodologies therefore control termination criteria and resource limits across methods when evaluating relative performance.^[2]

Similar considerations are adopted in reinforcement learning research, where algorithmic performance is commonly compared under matched conditions such as equal numbers of environment interactions, episodes, or training steps, so that differences reflect learning dynamics rather than unequal data or compute budgets.^[4]

Reproducibility and evaluation norms

The dispute touched on issues contributing to the replication crisis such as incomplete disclosure of methods, reliance on surrogate objectives, limited benchmark coverage, and difficulty reproducing results that depend on unusually large computational resources.^[5]^[6] U.S. Federal funding agencies have established formal expectations for research conduct. Their recent guidance emphasizes that science supported by government funding should be reproducible, transparent in its methods and uncertainties, and skeptical of assumptions.^[7]^[8] Nature Portfolio journals require authors to make materials, data, code, and associated protocols available to readers at publication without undue qualifications.^[9]

Reproducibility in scientific research faces significant challenges from questionable research practices (QRPs)—behaviors that fall in an ethical gray zone between sound science and outright misconduct. In artificial intelligence and machine learning research, QRPs undermine the integrity and replicability of findings by introducing systematic bias into reported outcomes. These practices include data manipulation, selective reporting, data leakage, and methodological shortcuts that enable researchers to present artificially positive results that other investigators cannot independently reproduce.^[10]^[11] During cherry-picking researchers selectively evaluate their methods on inputs or datasets with unusually good performance while excluding unfavorable results, the reported metrics no longer reflect true generalized performance. Other researchers attempting to validate the work using comprehensive test sets or different dataset selections will observe typically more modest results, making replication impossible. Similarly, omitting critical procedural details from methodology descriptions or experimental documentation prevents independent verification.^[10]

Macro placement in chip layout

Chip design for modern integrated circuits is typically a complex, expert-driven process that relies on electronic design automation and can take weeks or months to complete; therefore, advances that reduce key stages of this process from weeks to hours through computational automation are considered significant.^[12]^[13]^[1]

Macro placement is a step during chip layout that determines the geometric $(x,y)$ locations of large circuit components (macros) within a chip floorplan subject to predetermined rectangular boundaries, prior to detailed placement and wire routing. The number of macros per circuit ranges from several to many hundreds. Mixed-size placement generalizes macro placement by simultaneously placing both large macros and millions of small interconnected standard cells, requiring algorithms to handle objects that differ by several orders of magnitude in area and mobility.

Circuit components, including macros and standard cells, are interconnected by wiring. Because each macro is connected by wires to many other circuit components, their placement impacts the circuit’s routed wirelength, routing congestion, operating power, and timing performance. Macro placement and mixed-size placement are known to strongly influence downstream power, performance, and area outcomes during circuit layout and optimization. Prior methods include combinatorial optimization techniques such as simulated annealing, as well as analytical placement and hierarchical heuristics. These methods can relocate multiple circuit components at the same time and can relocate some components many times.^[14]^[15] Commentators noted that because macro placement is largely geometric and its fundamental algorithms are not tied to a specific process node, competing approaches can be evaluated on public benchmarks (tests) across technologies, rather than primarily on proprietary internal designs.^[13]^[16]^[17]^[18]

EDA vendor companies introduced automated software tools for floorplanning and mixed-size placement. For instance, Cadence’s Innovus implementation software added a Concurrent Macro Placer (CMP) feature by 2019 to automatically place large blocks and standard cells, though it did not disclose AI use.^[19]^[20] Academic researchers in physical design of integrated circuits had been exploring reinforcement learning and broader machine learning techniques as early as 2019.^[21] Despite progress, skepticism remained about the impact of such techniques. In late 2024, an IEEE Spectrum article by Intel researchers examining AI-driven floorplanning concluded that purely machine-learning-based methods were not yet sufficient for the full complexity of chip design. The authors found that conventional algorithms (such as classical search and optimization) still outperformed or needed to complement AI in handling multiple design constraints, suggesting that hybrid approaches combining AI with traditional EDA techniques would be more effective going forward.^[22]

The 2021 Nature paper and its claims

In 2021, Nature published a paper under the title “A graph‑placement methodology for fast chip design” co‑authored by 21 Google-affiliated researchers. The paper reported that an RL agent could generate macro placements for integrated circuits "in under six hours" and achieve improvements over human-designed layouts in power, timing performance, and area (PPA), standard chip-quality metrics referring respectively to energy consumption, chip operating speed, and silicon footprint (evaluated after wire routing).^[1] Circuit examples used in the study were parts of proprietary Google TPU designs, called blocks (or floorplan partitions). The paper reported results on five blocks and described the approach as generalizable across chip designs. It introduced a sequential macro placement algorithm in which macros are placed one at a time instead of optimizing their locations concurrently. At each step, the algorithm selects a location for a single macro on a discretized chip canvas, conditioning its decision on the placements of previously placed macros. This sequential formulation converts macro placement into a long-horizon decision process in which early placement choices constrain later ones. After macro placement, force-directed placement is applied to place standard cells connected to the macros. Deep reinforcement learning is used to train a policy network to place macros by maximizing a reward that reflects final placement quality (for example, wirelength and congestion). Policy learning occurs during self‑play for one or multiple circuit designs. Further placement optimizations refine the overall layout by balancing wirelength, density, and overlap constraints, while treating the macro locations produced by the RL policy as fixed obstacles. The approach relies on pre-training, in which the RL model is first trained on a corpus of prior designs (twenty in the Nature paper) to learn general placement patterns before being fine-tuned on a specific chip. Pre-training takes significant upfront time investment, but it reduces convergence time and improves stability in subsequent uses by initializing the policy with parameters that encode common structural regularities in macro placement problems.^[23]

The paper reported results that relied on proprietary Google chip designs, limiting independent verification and comparison with prior methods on common benchmark cases.^[16]^[18]

Claims with questionable substantiation

Critics questioned whether the data presented were sufficient to support the paper’s claims. They raised concerns about the consistency between the stated claims and the described methodology, as well as the fairness of the experimental comparisons reported in the Nature paper. In particular, critics argued that the reported runtime and quality comparisons between the reinforcement learning (RL) method and prior placement tools did not assess equivalent tasks under comparable conditions.

Concerns about insufficient data to back claims

The Nature paper described the reduction in design-process time as going from "days or weeks" to "hours", but did not provide per-design time breakdowns or specify the number of engineers, their level of expertise, or the baseline tools and workflow against which this comparison was made. It was also unclear whether the "days or weeks" baseline included time spent on functional design changes, idle time, or the use of inferior EDA tools. Critics argued that the paper’s framing of "fast chip design" was not backed by standardized wall-clock timing comparisons on individual benchmarks against established methodologies.^[17] Commentaries noted that the paper evaluated the method on fewer benchmarks (five) than is common in the field, showed mixed results across different evaluation goals, and did not report results of statistical hypothesis testing to rule out attribution of improvements to chance.^[20]^[17]

Consistency between claims and implementation

While the approach was described as improving circuit area, the RL optimization did not alter the overall circuit area, as it adjusted only the locations of fixed-shape non-overlapping circuit components within a fixed rectangular layout boundary.^[24]^[20]^[17]

Concerns about fairness of comparisons

Claims about RL runtimes were reported for macro placement only, whereas baseline tools such as RePlAce and commercial systems (such as Cadence CMP) perform both macro placement and placement of large numbers of standard cells. The Nature paper did not report the additional runtime required to place standard cells after the RL-generated macro placements, complicating direct runtime comparisons.^[1]^[24]

The claimed six-hour runtime bound per circuit example did not account for pre-training. In the described experiments, RL policies were trained on twenty circuit blocks and then evaluated on five additional blocks, but the reported runtime reflected only the evaluation phase.^[1]^[20]^[17]^{[note 1]}

Head‑to‑head comparisons in computational optimization typically show either higher solution quality at the same runtime or shorter runtime at the same quality.^[2]^[3] Critics observed that the Google Nature paper provided no such comparison and that its RL method consumed "exorbitant" computational resources.^[17]

Internal dispute at Google and legal proceedings

In 2022, Reuters and The New York Times reported that Satrajit Chatterjee, a Google engineer involved in reviewing the AlphaChip work, raised concerns internally and participated in drafting an alternative analysis. Chatterjee, S. Bae, A. Yazdanbakhsh, and other co-authors prepared a manuscript titled "Stronger Baselines for Evaluating Deep Reinforcement Learning in Chip Placement", also referred to as Stronger Baselines. An early version of the manuscript was leaked anonymously in 2022, contributing to public controversy around the AlphaChip claims. In this work, Chatterjee and his co-authors argued that simpler or established methods could outperform the RL approach under fair comparisons. In March 2022, Google declined to publish this analysis and terminated Chatterjee's employment.^[12]^[13]^[25] Chatterjee filed a wrongful dismissal lawsuit, alleging that representations related to the AlphaChip research involved fraud and scientific misconduct. He referred to whistleblower protections in California law. In particular, under Labor Code §1102.5, an employer may not punish an employee who discloses information to a manager or internal investigator, if the employee has reasonable cause to believe that the information reveals a violation of state or federal law.^[26] Labor Code §1102.5 explicitly protects employees who refuse to engage in activities that would violate a law and bars retaliation for such refusals.^[26] A whistleblower’s motive is irrelevant under California law.^[27]

According to court documents, Chatterjee's study was conducted "in the context of a large potential Google Cloud deal" and he noted that it "would have been unethical to imply that we had revolutionary technology when our tests showed otherwise."^[28] Furthermore, the committee that reviewed his paper and disapproved its publication was allegedly convened and chaired by subordinates of Jeff Dean, a senior co-author of the Nature paper and a senior vice president, and therefore lacked independence.^[28]^: 30 The lawsuit alleged that Google was "deliberately withholding material information from Company S to induce it to sign a cloud computing deal" using what Chatterjee viewed as questionable technology.^[28]^[12] The court denied Google’s motion to dismiss, holding that Chatterjee had plausibly alleged retaliation for refusing to engage in conduct he believed would violate state or federal law.^[29]^[25]

Reproducibility and compute scale

In October 2021, the editors of Nature were informed that the code behind the paper was unavailable at the time. The change history reports that code availability has been resolved and a correction notice was published on 31 March 2022 with the link to their GitHub repository.^[30]^[31] However, later publications noted important omissions^[24] and continued unavailability of proprietary data used in the paper for training and evaluation.^[16] Four years after publication, parts of the source code necessary to reproduce the results were missing.^[17]^[20]

When describing its experiments, the 2021 Nature paper mentioned only "up to a few hundred" macros per circuit used in the study. It also withheld the sizes and shapes of macros, as well as other key design parameters such as area utilization.^[17] In the 2024 addendum, the estimate for macro counts was revised to "up to 107" in pre-training and "up to 131" during evaluation.^[32]

The evaluation reported in the paper relied on resources (multiple computers in Google Cloud) that were orders of magnitude larger than those typically used by academic or commercial placement tools, hindering fair comparison.^[17]

Parts of the chip implementation effort necessary for evaluating chip metrics (PPA) were performed by an unnamed third party. This complicated the reproduction of results and attribution of reported improvements to the methods described in the Nature paper. Later, it became known that Google's TPU chips were co-designed with Broadcom.^[33]^[34]

According to peer-reviewed journal publications, no positive independent replication of the Nature results had been reported in peer-reviewed literature three^[17] and four^[20] years since publication. Commentators continue to highlight unresolved challenges in reproducing the reported outcomes due to incomplete methodological disclosure, questionable techniques and evaluation methodologies, as well as reliance on proprietary inputs. In the absence of positive replications, researchers question the scientific significance and technical validity of the original claims.^[20]^[18]^[35]

Independent empirical assessment

The Stronger Baselines manuscript was produced by a team within Google led by Satrajit Chatterjee, having no common co-authors with the Nature paper. According to reports, this manuscript compared the RL approach to a simulated annealing implementation in terms of the proxy objective. It found that simulated annealing significantly outperformed the Google method on both proprietary Google designs and public IBM benchmarks.^[24]

Researchers at the University of California, San Diego (UC San Diego) led by professors Chung-Kuan Cheng and Andrew B. Kahng published a paper at the 2023 International Symposium on Physical Design (ISPD) that examined RL-based placement using public benchmarks and additional test cases. They used established circuit benchmarks for macro placement and one non-proprietary example released by Google researchers. To reflect contemporary circuit design practices, they additionally prepared modern circuit benchmarks with appropriate macros and released them on GitHub. They performed evaluation with several baselines: manual (human) macro placement, macro placement by simulated annealing described in the Stronger Baselines manuscript, and an academic mixed-size placer RePlAce. Compared with the corresponding baselines in the Nature paper, these three methods produced more competitive results. The fourth baseline—Cadence CMP, a commercial mixed‑size placer—produced the best results overall (in terms of the proxy objective and the PPA metrics) while spending considerably less runtime and using a single server, unlike the Nature experiments. Their studies reached conclusions consistent with those of the Stronger Baselines manuscript: simulated annealing was faster and achieved comparable or better quality than the RL-based approach in terms of the proxy objectives. They also validated these results for the PPA metrics. However, the researchers reported that most of the measured Kendall rank-correlation values between the proxy and the PPA metrics were close to zero.^[24]^[28]^[36]

In the 2023 IEEE/ACM MLCAD Workshop contest on macro placement, RL-based approaches were largely absent from the competitive results. The contest was explicitly motivated by recent deep RL work on chip placement and specifically the Google Nature paper. Nevertheless, the contest organizers reported that all but one of the submitted solutions relied on classical optimization techniques rather than machine learning. The single team that attempted an RL-augmented approach used an RL-parameterized simulated annealing algorithm, but this entry did not produce competitive results.^[37]

Scientific scrutiny and technical criticism

Several researchers and commentators raised criticisms of the approach described in the Nature paper and the reporting of computational experiments. These included deficiencies in the approach and arguments that the proposed method did not demonstrate improvements over existing state-of-the-art placement techniques, as well as concerns about its technical significance.

Proxy objective and method design

The proxy cost is a simplified objective function used to guide the RL agent during training and evaluation. It combines estimates of wirelength, cell density, and routing congestion that can be computed quickly without running full chip design tools. However, the actual quality of a chip design is measured by 'ground truth' metrics obtained after running commercial place-and-route tools, including total routed wirelength, power consumption, timing (worst negative slack and total negative slack), and design rule check violations.^[1]

A key methodological concern is whether optimizing the proxy cost actually leads to better ground truth outcomes—if the proxy cost does not correlate well with ground truth metrics, the RL agent may be optimizing for the wrong objective. Subsequent experimental studies found that the correlation between the proxy objective and standard PPA metrics was low. Markov further argued that because the proxy function omits circuit timing information, optimizing it is unlikely to improve overall circuit performance.^[24]^[17]^[20]

Questionable research practices

In October 2024, sixteen methodological concerns were grouped into categories and itemized as "initial doubts" in a detailed critique by chip design researcher and former University of Michigan professor Igor L. Markov in Communications of the ACM. The critique was initially published as an arXiv preprint in 2023.^[38] Markov joined Synopsys in 2024. The critique described multiple questionable research practices in the evaluation of AlphaChip, particularly around selective reporting of benchmarks and outcomes, selective use of metrics, and selective choice of baselines.^[17]

The critique identified a pattern of cherry-picking across multiple dimensions of the research. Regarding benchmarks, the Nature paper reported results with unclear statistical significance on only five chip blocks, despite discussing twenty blocks total. The peer review file revealed that a reviewer had requested results on specific public benchmarks, but those were never published by the authors.^[1] The choice of baselines also raised concerns: the Nature paper compared AlphaChip to a weaker variant of simulated annealing and to an unspecified "human baseline," while later publications indicated that the Google method consistently underperformed both the Cadence CMP tool and established algorithms including a stronger implementation of simulated annealing. Additionally, the paper reported chip-quality metrics for its own method but omitted these same metrics for simulated annealing, instead evaluating it using a proxy function that did not accurately represent or correlate with actual chip-quality metrics.

Beyond selective reporting, the critique highlighted issues with misreporting and data leakage. The criticized results were produced using undisclosed $(x,y)$ data from a commercial software tool.^[24] Years after the original paper and following multiple rounds of criticism,^[12] Nature published an authors' addendum disclosing this usage.^[32] Researchers also reported additional discrepancies between the Nature paper's description and the methods and data actually used to produce results and published as source code.^[24]^[20] Furthermore, researchers outside Google were unable to replicate the original findings.^[18]^[35] A subsequent Google publication noted that pre-training on "diverse TPU designs" did not improve result quality, whereas pre-training on "previous netlist versions" produced some quality gains.^[23] Although the Nature paper did not disclose this, using the same or similar data for both pre-training and testing could represent a significant methodological flaw known as data leakage or contamination.^[17]

Nature editorial actions

In April 2022, the peer review file for the Nature article was included as a supplementary information file.^[30]

In September 2023, Nature added an editor's note to "A graph placement methodology for fast chip design" stating that the paper's performance claims had been called into question and that the editors were investigating the concerns. On 21 September 2023, Andrew B. Kahng's accompanying News & Views article was retracted; the retraction notice said that new information about the methods used in the Google paper had become available after publication and had changed the author’s assessment, and it also said that Nature was conducting an independent investigation of the paper’s performance claims. By late September 2024, the editor's note was removed without explanation,^[36]^[17] but Nature published an addendum to the original paper (dated 26 September 2024). The addendum introduced the name AlphaChip for the proposed RL technique and described methodological details that critics had previously identified as missing, including the use of initial $(x,y)$ locations.^[32] The addendum addressed some methodological details but lacked full training and evaluation inputs needed for independent replication.^[17]^[18]

Author responses and ensuing debate

A variety of author responses appear in reliable sources, including news media coverage,^[12]^[13]^[35] Nature article corrections and addenda,^[31]^[32] secondary coverage in scholarly publications,^[17] and commentary in Communications of the ACM.^[39] Lead authors Azalia Mirhoseini and Anna Goldie rejected internal allegations of fraud or serious methodological flaws, describing whistleblower Satrajit Chatterjee's complaints as a "campaign of misinformation."^[40] Google spokespeople stated that the method had been vetted, open-sourced, independently replicated, and deployed "around the world."^[35] However, published scholarly analyses noted that, three years after publication, no external replications or positive peer-reviewed confirmations of the Nature results had been reported; an independent UC San Diego implementation found commercial tools to be superior.^[20]^[17] Google researchers rebutted criticisms by arguing that critics omitted pre-training and used insufficient compute. In response, the 2024 CACM paper pointed out that the impact of pre-training on chip metrics had not been established, cited prior studies indicating that its impact on the proxy objective was limited, and pointed out that Google code release included no support for pre-training. It also explained that the possible use of AlphaChip in production does not, by itself, demonstrate that it outperformed prior state-of-the-art methods in macro placement.^[17] Goldie, Mirhoseini, and Dean responded to the peer-reviewed paper published in CACM in their letter to the editor, describing its meta-analysis as "regurgitating… unpublished, non-peer-reviewed arguments" and containing "thinly veiled fraud allegations already found to be without merit by Nature."^[39] No independent, publicly available reliable sources were cited in support of this characterization. Nature does not normally conduct formal investigations into allegations of research fraud itself, instead referring such cases to authors’ institutions or funding bodies, which the journal has stated are responsible for carrying out full investigations and determining whether misconduct has occurred.^[41]

Deployment claims and attempts at verification

In September 2024, Google DeepMind introduced the name AlphaChip in a corporate blog post, where lead authors Anna Goldie and Azalia Mirhoseini claimed that AlphaChip had been used internally to generate chip layouts for multiple generations of Google's Tensor Processing Units (TPUs) and had also been applied to other Alphabet chips. They additionally claimed that external organizations were adopting or building on AlphaChip, citing MediaTek as an example. Similar claims were attributed to statements by Google’s Senior VP and Nature paper co-author Jeff Dean in coverage by Communications of the ACM and New Scientist that also described how Google's claims were met with skepticism from university professors.^[18]^[35]

Multiple reliable sources report an ongoing collaboration between Google and MediaTek on the design and production of Google’s Tensor Processing Units (TPUs)^[42] as well as MediaTek securing orders for multiple generations of TPUs.^[43] Scholarly standards in scientific and engineering research generally emphasize independent replication or validation by third parties as a means of strengthening confidence in research findings beyond a single report or internal adoption, reflecting broader norms of reproducibility and replicability in science and engineering research.^[44] A 2024 paper on macro placement published at the International Symposium on Physical Design (ISPD) by a MediaTek researcher briefly discussed RL-based macro placement approaches such as the Nature paper, without mentioning adoption, and argued that these academic methods were not yet sufficient for modern system-on-chip floorplanning in industrial design flows.^[45]

At ISPD 2023, researchers from Nvidia published a peer-reviewed paper describing an unrelated macro placement approach that combined GPU-accelerated analytical placement with automated parameter tuning.^[46] Independent benchmarking by Cheng and Kahng reported at the same conference found that the Nvidia approach achieved better results than Google's open-source implementation of the Nature macro placement method on shared modern circuit benchmarks.^[24]

October 2024 coverage in Communications of the ACM (CACM) reported that Google did not provide public benchmark evidence for AlphaChip performance, and quoted computer scientist Moshe Vardi saying that the community has not been able to verify Google's claims.^[18] A peer-reviewed CACM research article argued that, despite the Nature paper claiming applications to industry chips, "aside from vague general claims, no chip-metric improvements were reported for specific production chips", and emphasized that publicly reported benchmark and product metrics are necessary to substantiate claims of state-of-the-art performance.^[17]

Another peer-reviewed assessment noted that Google's public codebase had been updated to "enable[] use of DREAMPlace … to finalize soft macro placement," but explained that DREAMplace is avoided in the evaluation of the claims of the original Nature (2021) paper and "not newer variations that mix RL with other techniques already known to work well for macro placement."^[20]

Scholarly responses and peer-reviewed resolution

Starting in 2022, multiple researchers and commentators called for results on publicly available benchmarks to settle the dispute through independent verification and fair comparison.^[12]^[13]^[16]^[35] Coverage in Communications of the ACM noted that scientific controversies are normally resolved through independent replication of published results, peer-reviewed critique, and independent scrutiny rather than internal processes or corporate communications. In December 2024, the magazine’s editor-in-chief, James Larus, publicly invited Jeff Dean and his co-authors to submit their technical response to critiques for peer review, emphasizing that open scrutiny is the appropriate mechanism for resolving such disputes.^[16]^[18]^[47]

Subsequent to the initial critique by researchers such as Cheng and Kahng at the ISPD 2023 conference, peer-reviewed discussion revisited these concerns. In particular, a 2025 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) article by the UC San Diego team lead by Cheng and Kahng compiled and addressed technical objections by Goldie, Mirhoseini, and Dean to the ISPD analysis — including issues of training methodology, computational scale, convergence, benchmark selection and implementation differences — while re-examining the experimental methodology and replicability of the Nature paper’s reported results:^[20]

The journal paper incorporated explicit pre-training studies following instructions later released by Google and additionally evaluated fine-tuning from Google's August 2024 pre-trained AlphaChip checkpoint, alongside training from scratch.
Experiments were repeated using substantially greater computational resources than in the original ISPD study, which facilitated increasing data collection jobs for RL from 26 to 256.
To avoid early termination, experiments doubled the number of training iterations and performed multiple attempts when convergence was uncertain.
To address concerns regarding benchmarks, evaluations included the Ariane design released by Google using 7 nm chip technology, as well as additional modern circuit benchmarks using 7nm technology evaluated using standard commercial place-and-route tools to report post-route performance, power, and area (PPA) metrics.
The journal study clarified that its main evaluation flow did not rely on a reimplementation of Google’s method or comparisons with tools unavailable at the time of the original Nature publication. The study included comparisons with earlier commercially available versions of Cadence CMP.
The UC San Diego team noted that several research groups, including an anonymous reviewer, replicated their results.

Across these expanded experiments and methodological revisions, the TCAD study reported results consistent with those of the ISPD 2023 paper and Markov’s 2024 Communications of the ACM article, concluding that the reinforcement-learning approach described in the Nature publication did not consistently outperform established placement methods and typically required significantly greater computational effort.

Notes

^ By contrast, large language models (LLMs) used in systems such as ChatGPT reuse their pre-training across numerous inference tasks over considerable deployment time so as to amortize pre-training.

References

^ ^a ^b ^c ^d ^e ^f ^g Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; et al. (2021). "A graph placement methodology for fast chip design". Nature. 594 (7862): 207–212. doi:10.1038/s41586-021-03544-w. PMID 34108699.
^ ^a ^b ^c Dolan, Elizabeth D.; Moré, Jorge J. (2002). "Benchmarking optimization software with performance profiles". Mathematical Programming. 91 (2): 201–213. doi:10.1007/s101070100263.
^ ^a ^b Nocedal, Jorge; Wright, Stephen J. (2006). Numerical Optimization (2nd ed.). Springer. ISBN 978-0-387-30303-1.
^ Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. ISBN 978-0-262-03924-6.
^ Baker, Monya (2016-05-26). "1,500 scientists lift the lid on reproducibility". Nature. 533 (7604): 452–454. doi:10.1038/533452a.
^ Committee on Reproducibility and Replicability in Science (2019). Reproducibility and Replicability in Science. National Academies Press. ISBN 9780309486163.
^ "US Government Resources Related to Research Rigor and Reproducibility". Journal of Library Administration. 2018-10-07. Retrieved 2026-01-28.
^ "Guidelines for the Conduct of Research Supported by NIH" (PDF). Office of Intramural Research, National Institutes of Health. Retrieved 2026-01-28.
^ "Availability of data, materials, code and protocols". Nature Portfolio. Retrieved 2026-01-25.
^ ^a ^b Knight, Will (2022-08-10). "Sloppy Use of Machine Learning Is Causing a 'Reproducibility Crisis'". Wired. Retrieved 2026-01-28.
^ Kapoor, Sayash; Narayanan, Arvind (2023-08-03). "Leakage and the reproducibility crisis in machine-learning-based research". Patterns. 4 (9). Retrieved 2026-01-28.
^ ^a ^b ^c ^d ^e ^f Wakabayashi D, Metz C (2022-05-02). "Another Firing Among Google's A.I. Brain Trust, and More Discord". The New York Times. ISSN 0362-4331. Archived from the original on 2022-06-12. Retrieved 2022-06-12.
^ ^a ^b ^c ^d ^e Dave, Paresh (2022-05-03). "Google faces internal battle over research on AI to speed chip design". Reuters.
^ Chang, Yao-Wen; Jou, Jing-Yang (2004). "VLSI placement algorithms: A survey". Integration, the VLSI Journal. 38 (1): 1–30. doi:10.1016/j.vlsi.2003.11.001.
^ Markov, Igor L.; Hu, Jin; Kim, Myung-Chul (November 2015). "Progress and Challenges in VLSI Placement Research" (PDF). Proceedings of the IEEE. 103 (11). IEEE: 1985–2003. doi:10.1109/JPROC.2015.2478963.
^ ^a ^b ^c ^d ^e Goth, Gregory (2023-03-29). "More details, but not enough". Communications of the ACM.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s Markov, Igor L. (2024). "Reevaluating Google's Reinforcement Learning for IC Macro Placement". Communications of the ACM. 67 (12): 54–71. arXiv:2306.09633. doi:10.1145/3676845.
^ ^a ^b ^c ^d ^e ^f ^g ^h Halper, Mark (2024-11-04). "Updates Spark Uproar". Communications of the ACM.
^ Esmaeilzadeh, Hadi; Ghodrati, Soroush; Kahng, Andrew B.; Kim, Joon Kyung; Kinzer, Sean; Kundu, Sayak; Mahapatra, Rohan; Manasi, Susmita Dey; Sapatnekar, Sachin S.; Wang, Zhiang; Zeng, Ziqing (2024). "An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators". ACM Transactions on Design Automation of Electronic Systems. 29 (4): 68:1–68:33. doi:10.1145/3664652. Retrieved 2026-02-07. For macro-heavy designs, we employ Innovus' concurrent macro placer to automatically place all the macros.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l Cheng, Chung-Kuan; Kahng, Andrew B.; Kundu, Sayak; Wang, Zhiang. "An Updated Assessment of Reinforcement Learning for Macro Placement". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. doi:10.1109/TCAD.2025.3644293.
^ Gandhi, Upma; Bustany, Ismail; Swartz, William; Behjat, Laleh (2019). "A Reinforcement Learning-Based Framework for Solving Physical Design Routing Problem in the Absence of Large Test Sets". Proc. ACM/IEEE Workshop on Machine Learning for CAD (MLCAD).
^ Majumdar, Somdeb; Mallappa, Uday; Mostafa, Hesham (21 November 2024). "AI Alone Isn't Ready for Chip Design". IEEE Spectrum.
^ ^a ^b Yue, Summer; Songhori, Ebrahim M.; Jiang, Joe Wenjie; Boyd, Toby; Goldie, Anna; Mirhoseini, Azalia; Guadarrama, Sergio (2022). "Scalability and generalization of circuit training for chip floorplanning". Proceedings of the International Symposium on Physical Design (ISPD). ACM. pp. 65–70. doi:10.1145/3505170.3511478.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Cheng, Chung-Kuan; Kahng, Andrew B.; Kundu, Sayak; Wang, Yucheng; Wang, Zhiang (2023). "Assessment of Reinforcement Learning for Macro Placement". Proceedings of the 2023 International Symposium on Physical Design (ISPD ’23). ACM: 158–166. arXiv:2302.11014. doi:10.1145/3569052.3578926.
^ ^a ^b "Fired Google AI engineer's whistleblower lawsuit moves ahead". Hindustan Times. 2023-07-21.
^ ^a ^b "California Labor Code § 1102.5". FindLaw. Retrieved 2026-02-21. An employer…shall not retaliate against an employee for disclosing information…if the employee has reasonable cause to believe that the information discloses a violation of state or federal statute… An employer…shall not retaliate against an employee for refusing to participate in an activity that would result in a violation of state or federal statute…
^ Lipinsky, Daren H. (April 2022). "The profound potency of Labor Code section 1102.5(b)". Advocate (Consumer Attorneys Association of Los Angeles). Retrieved January 28, 2026.
^ ^a ^b ^c ^d Satrajit Chatterjee v. Google, LLC (Case no. 22CV398683) (Superior Court of California, County of Santa Clara 21 February 2023) ("The full complaint as filed in Santa Clara County, detailing Chatterjee’s whistleblower allegations and claims against Google."), Text.
^ Burnson, Robert (2023-07-20). "Fired Google engineer's whistleblower lawsuit moves forward". Bloomberg.
^ ^a ^b
- "Change history: A graph placement methodology for fast chip design". Nature. Springer Nature. 2023-09-20. Retrieved 2026-01-30.
- Change history on CrossRef
^ ^a ^b Mirhoseini, Azalia; Goldie, Anna; et al. (2022). "Author Correction: A graph placement methodology for fast chip design". Nature. 604 (7906): E24. doi:10.1038/s41586-022-04657-6.
^ ^a ^b ^c ^d Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; et al. (2024-09-26). "Addendum: A graph placement methodology for fast chip design". Nature. 634: E10. doi:10.1038/s41586-024-08032-5.
^ Mann, Tobias (22 September 2023). "For your info, Broadcom helped Google make those TPU chips". The Register. Archived from the original on 13 August 2025. Retrieved 1 September 2025.
^ >"Google expects no change in its relationship with AI chip supplier Broadcom". Reuters. 2023-09-21. Retrieved 2025-09-04.
^ ^a ^b ^c ^d ^e ^f Hsu J (2024-10-14). "Google says its AI designs chips better than humans - Experts disagree". New Scientist. Archived from the original on March 5, 2025. Retrieved 2025-07-27.
^ ^a ^b Joelving, Fredrik (2023-09-26). "Nature flags doubts over Google AI study, pulls commentary". Retraction Watch.
^ Bustany, I.; Gasparyan, G.; Gupta, A.; et al. (2023). In Proc. The 2023 MLCAD FPGA Macro Placement Benchmark Design Suite and Contest Results. ACM/IEEE Workshop on Machine Learning for CAD. pp. 1–6. doi:10.1109/mlcad58807.2023.10299868.
^ Markov, Igor L. (2023). "The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement". arXiv:2306.09633 [cs.LG].
^ ^a ^b Goldie, Anna; Mirhoseini, Azalia; Dean, Jeff (2024-12-20). "Is It Math or CS? Or Is It Both?". Communications of the ACM. Analyzing 'Meta-analysis'.
^ Simonite, Tom (2022-05-31). "Tension Inside Google Over a Fired AI Researcher's Conduct". Wired. Retrieved 2026-01-25.
^ "Editorial: Handling scientific fraud". Nature. 439 (7076): 520. 2006. doi:10.1038/439520a.
^ Sriram, Akash; Sophia, Deborah (17 March 2025). "Google preparing to partner with Taiwan's MediaTek on next AI chip, The Information reports". Reuters.
^ "MediaTek reportedly secures major orders for two generations of Google TPUs". TechNode. 15 December 2025.
^ "Reproducibility and Replicability in Science". National Academies Press. Retrieved 28 February 2026.
^ Tseng, I.-L. (2024). "Challenges in Floorplanning and Macro Placement for Modern SoCs". Proceedings of the International Symposium on Physical Design (ISPD 2024). ACM. pp. 71–72. doi:10.1145/3626184.3639695. Retrieved 2026-02-13.
^ Agnesina, A.; Rajvanshi, P.; Yang, T.; Pradipta, G.; Jiao, A.; Keller, B.; Khailany, B.; Ren, H. "AutoDMP: Automated DREAMPlace-based Macro Placement". NVIDIA Research. Retrieved 2026-02-13.
^ Larus, James (2024-12-20). "Is It Math or CS? Or Is It Both?". Communications of the ACM. Editor-in-Chief’s response.

[25] By contrast, large language models (LLMs) used in systems such as ChatGPT reuse their pre-training across numerous inference tasks over considerable deployment time so as to amortize pre-training.

[Nature2021-1] ^ ^a ^b ^c ^d ^e ^f ^g Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; et al. (2021). "A graph placement methodology for fast chip design". Nature. 594 (7862): 207–212. doi:10.1038/s41586-021-03544-w. PMID 34108699.

[DolanMore2002-2] Dolan, Elizabeth D.; Moré, Jorge J. (2002). "Benchmarking optimization software with performance profiles". Mathematical Programming. 91 (2): 201–213. doi:10.1007/s101070100263.

[NocedalWright2006-3] Nocedal, Jorge; Wright, Stephen J. (2006). Numerical Optimization (2nd ed.). Springer. ISBN 978-0-387-30303-1.

[SuttonBarto2018-4] Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. ISBN 978-0-262-03924-6.

[NatureRepro2016-5] Baker, Monya (2016-05-26). "1,500 scientists lift the lid on reproducibility". Nature. 533 (7604): 452–454. doi:10.1038/533452a.

[NAS2019-6] Committee on Reproducibility and Replicability in Science (2019). Reproducibility and Replicability in Science. National Academies Press. ISBN 9780309486163.

[Gov-7] "US Government Resources Related to Research Rigor and Reproducibility". Journal of Library Administration. 2018-10-07. Retrieved 2026-01-28.

[NIH-8] "Guidelines for the Conduct of Research Supported by NIH" (PDF). Office of Intramural Research, National Institutes of Health. Retrieved 2026-01-28.

[NaturePolicyDataCode-9] "Availability of data, materials, code and protocols". Nature Portfolio. Retrieved 2026-01-25.

[Knight2022-10] Knight, Will (2022-08-10). "Sloppy Use of Machine Learning Is Causing a 'Reproducibility Crisis'". Wired. Retrieved 2026-01-28.

[Leakage2023-11] Kapoor, Sayash; Narayanan, Arvind (2023-08-03). "Leakage and the reproducibility crisis in machine-learning-based research". Patterns. 4 (9). Retrieved 2026-01-28.

[NYT2022-12] ^ ^a ^b ^c ^d ^e ^f Wakabayashi D, Metz C (2022-05-02). "Another Firing Among Google's A.I. Brain Trust, and More Discord". The New York Times. ISSN 0362-4331. Archived from the original on 2022-06-12. Retrieved 2022-06-12.

[Reuters2022-13] Dave, Paresh (2022-05-03). "Google faces internal battle over research on AI to speed chip design". Reuters.

[ChangSurvey-14] Chang, Yao-Wen; Jou, Jing-Yang (2004). "VLSI placement algorithms: A survey". Integration, the VLSI Journal. 38 (1): 1–30. doi:10.1016/j.vlsi.2003.11.001.

[IEEE2015-15] Markov, Igor L.; Hu, Jin; Kim, Myung-Chul (November 2015). "Progress and Challenges in VLSI Placement Research" (PDF). Proceedings of the IEEE. 103 (11). IEEE: 1985–2003. doi:10.1109/JPROC.2015.2478963.

[Goth2023-16] Goth, Gregory (2023-03-29). "More details, but not enough". Communications of the ACM.

[MarkovCACM2024-17] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s Markov, Igor L. (2024). "Reevaluating Google's Reinforcement Learning for IC Macro Placement". Communications of the ACM. 67 (12): 54–71. arXiv:2306.09633. doi:10.1145/3676845.

[UpdatesSparkUproar2024-18] ^ ^a ^b ^c ^d ^e ^f ^g ^h Halper, Mark (2024-11-04). "Updates Spark Uproar". Communications of the ACM.

[TODAES-19] Esmaeilzadeh, Hadi; Ghodrati, Soroush; Kahng, Andrew B.; Kim, Joon Kyung; Kinzer, Sean; Kundu, Sayak; Mahapatra, Rohan; Manasi, Susmita Dey; Sapatnekar, Sachin S.; Wang, Zhiang; Zeng, Ziqing (2024). "An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators". ACM Transactions on Design Automation of Electronic Systems. 29 (4): 68:1–68:33. doi:10.1145/3664652. Retrieved 2026-02-07. For macro-heavy designs, we employ Innovus' concurrent macro placer to automatically place all the macros.

[TCAD2025-20] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l Cheng, Chung-Kuan; Kahng, Andrew B.; Kundu, Sayak; Wang, Zhiang. "An Updated Assessment of Reinforcement Learning for Macro Placement". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. doi:10.1109/TCAD.2025.3644293.

[RL2019-21] Gandhi, Upma; Bustany, Ismail; Swartz, William; Behjat, Laleh (2019). "A Reinforcement Learning-Based Framework for Solving Physical Design Routing Problem in the Absence of Large Test Sets". Proc. ACM/IEEE Workshop on Machine Learning for CAD (MLCAD).

[Intel2024-22] Majumdar, Somdeb; Mallappa, Uday; Mostafa, Hesham (21 November 2024). "AI Alone Isn't Ready for Chip Design". IEEE Spectrum.

[YueISPD2022-23] Yue, Summer; Songhori, Ebrahim M.; Jiang, Joe Wenjie; Boyd, Toby; Goldie, Anna; Mirhoseini, Azalia; Guadarrama, Sergio (2022). "Scalability and generalization of circuit training for chip floorplanning". Proceedings of the International Symposium on Physical Design (ISPD). ACM. pp. 65–70. doi:10.1145/3505170.3511478.

[ISPD2023-24] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Cheng, Chung-Kuan; Kahng, Andrew B.; Kundu, Sayak; Wang, Yucheng; Wang, Zhiang (2023). "Assessment of Reinforcement Learning for Macro Placement". Proceedings of the 2023 International Symposium on Physical Design (ISPD ’23). ACM: 158–166. arXiv:2302.11014. doi:10.1145/3569052.3578926.

[HT2023-26] "Fired Google AI engineer's whistleblower lawsuit moves ahead". Hindustan Times. 2023-07-21.

[LaborCode-27] "California Labor Code § 1102.5". FindLaw. Retrieved 2026-02-21. An employer…shall not retaliate against an employee for disclosing information…if the employee has reasonable cause to believe that the information discloses a violation of state or federal statute… An employer…shall not retaliate against an employee for refusing to participate in an activity that would result in a violation of state or federal statute…

[Lipinsky2022-28] Lipinsky, Daren H. (April 2022). "The profound potency of Labor Code section 1102.5(b)". Advocate (Consumer Attorneys Association of Los Angeles). Retrieved January 28, 2026.

[FAC-29] Satrajit Chatterjee v. Google, LLC (Case no. 22CV398683) (Superior Court of California, County of Santa Clara 21 February 2023) ("The full complaint as filed in Santa Clara County, detailing Chatterjee’s whistleblower allegations and claims against Google."), Text.

[Bloomberg2023-30] Burnson, Robert (2023-07-20). "Fired Google engineer's whistleblower lawsuit moves forward". Bloomberg.

[change_history-31] 
"Change history: A graph placement methodology for fast chip design". Nature. Springer Nature. 2023-09-20. Retrieved 2026-01-30.

Change history on CrossRef

[32] "Change history: A graph placement methodology for fast chip design". Nature. Springer Nature. 2023-09-20. Retrieved 2026-01-30.

[33] Change history on CrossRef

[NatureCorrection2022-32] Mirhoseini, Azalia; Goldie, Anna; et al. (2022). "Author Correction: A graph placement methodology for fast chip design". Nature. 604 (7906): E24. doi:10.1038/s41586-022-04657-6.

[Addendum-33] Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; et al. (2024-09-26). "Addendum: A graph placement methodology for fast chip design". Nature. 634: E10. doi:10.1038/s41586-024-08032-5.

[Broadcom1-34] Mann, Tobias (22 September 2023). "For your info, Broadcom helped Google make those TPU chips". The Register. Archived from the original on 13 August 2025. Retrieved 1 September 2025.

[Broadcom2-35] >"Google expects no change in its relationship with AI chip supplier Broadcom". Reuters. 2023-09-21. Retrieved 2025-09-04.

[Hsu_2024-36] ^ ^a ^b ^c ^d ^e ^f Hsu J (2024-10-14). "Google says its AI designs chips better than humans - Experts disagree". New Scientist. Archived from the original on March 5, 2025. Retrieved 2025-07-27.

[RWJoelving2023-37] Joelving, Fredrik (2023-09-26). "Nature flags doubts over Google AI study, pulls commentary". Retraction Watch.

[Bustany2023-38] Bustany, I.; Gasparyan, G.; Gupta, A.; et al. (2023). In Proc. The 2023 MLCAD FPGA Macro Placement Benchmark Design Suite and Contest Results. ACM/IEEE Workshop on Machine Learning for CAD. pp. 1–6. doi:10.1109/mlcad58807.2023.10299868.

[FalseDawn-39] Markov, Igor L. (2023). "The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement". arXiv:2306.09633 [cs.LG].

[AuthorResponses-40] Goldie, Anna; Mirhoseini, Azalia; Dean, Jeff (2024-12-20). "Is It Math or CS? Or Is It Both?". Communications of the ACM. Analyzing 'Meta-analysis'.

[Wired2022-41] Simonite, Tom (2022-05-31). "Tension Inside Google Over a Fired AI Researcher's Conduct". Wired. Retrieved 2026-01-25.

[NatureFraudPolicy-42] "Editorial: Handling scientific fraud". Nature. 439 (7076): 520. 2006. doi:10.1038/439520a.

[ReutersMediatek-43] Sriram, Akash; Sophia, Deborah (17 March 2025). "Google preparing to partner with Taiwan's MediaTek on next AI chip, The Information reports". Reuters.

[TechNodeMediatek-44] "MediaTek reportedly secures major orders for two generations of Google TPUs". TechNode. 15 December 2025.

[45] "Reproducibility and Replicability in Science". National Academies Press. Retrieved 28 February 2026.

[MediaTek2024-46] Tseng, I.-L. (2024). "Challenges in Floorplanning and Macro Placement for Modern SoCs". Proceedings of the International Symposium on Physical Design (ISPD 2024). ACM. pp. 71–72. doi:10.1145/3626184.3639695. Retrieved 2026-02-13.

[Nvidia-47] Agnesina, A.; Rajvanshi, P.; Yang, T.; Pradipta, G.; Jiao, A.; Keller, B.; Khailany, B.; Ren, H. "AutoDMP: Automated DREAMPlace-based Macro Placement". NVIDIA Research. Retrieved 2026-02-13.

[LarusInvite-48] Larus, James (2024-12-20). "Is It Math or CS? Or Is It Both?". Communications of the ACM. Editor-in-Chief’s response.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[note 1]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]