AI models rate ‘right’ think tanks lower in terms of morality, objectivity, and quality than those on the ‘left’

Authors

Arthur Gailes

Edward J. Pinto

Jonathan Chew

Executive Summary

Large-language models (LLMs) increasingly inform policy research. We asked 5 flagship LLMs from leading AI companies in 2025 (OpenAI, Google, Anthropic, xAI, and DeepSeek) to rate 26 prominent U.S. think tanks on 12 criteria spanning research integrity, institutional character, and public engagement. Their explanations and ratings expose a clear ideological tilt.

Key findings

Why it matters

LLM-generated reputations already steer who is cited, invited, and funded. If LLMs systematically boost center-left institutes and depress right-leaning ones, writers, committees, and donors may unknowingly amplify a one-sided view, creating feedback loops that entrench any initial bias.

Next steps

Addressing this divergence is essential if AI-mediated knowledge platforms are to broaden rather than narrow debate in U.S. policy discussions.

1 Introduction

Left‑leaning bias in artificial intelligence (AI) is widely documented: today’s most‑used services tend to favor people, ideas, and institutions on the left of the spectrum. Because large‑language models (LLMs) now largely define how journalists, analysts, and citizens consume policy research, any tilt can quietly steer attention and approval toward or away from particular organizations. This report tests whether that “left” preference extends to U.S. research and policy institutions (“think tanks”). We ask five flagship LLMs to classify the political orientation of a roster of twenty-six U.S. think tanks.1 We then find that those think tanks the LLMs assess as “center‑left” or “left” are rated more highly on core metrics such as Moral Integrity, Research Quality, and Objectivity.

2 Literature Review

AI plays a growing role in how people receive and process news and information. Earlier studies measuring the influence of algorithms find that users are highly trusting of algorithmic advice, preferring these recommendations to crowdsourced answers and human judgement (Gunaratne, Zalmanson, & Nov, 2018; Logg, Minson, & Moore, 2019). This trust increases as tasks become more demanding and is prevalent even when an algorithm’s findings are clearly incorrect (Liel & Zalmonson, 2020; Bogert, Schecter, & Watson, 2021). More recent studies show that generative chatbots employed to help users make decisions are highly persuasive in changing user attitudes, even when providing biased information that opposes a user’s ideology (Bai et al. 2025; Fisher et al., 2024; Jakesch et al., 2023). That said, users may also build more trusting relationships with AI when they perceive their ideologies to align (Messer, 2025).

Given this growing role, it is important to understand the ideological or political intuitions of LLMs. The body of work covering political bias in AI-generated content is newer and growing (Wan et. al, 2023; Abid, Farooki, & Zou, 2021; An et al, 2024). So far, the results from this research have been consistent, with researchers finding significant evidence of a left-leaning ideological bias in LLM responses (Motoki, Neto, & Rodrigues, 2023; Rozado, 2024; Motoki, Neto, & Rodrigues, 2025; (Westwood, Grinner, & Hall, 2025). Given the well-documented trust that users place in this technology, and its left-leaning ideological tilt, this evidence has implications for how research and policy institutes (“think tanks”) and their ideas are consumed by the public. This study adds to the growing body of research that studies political bias in AI and its impact on public policy by testing the assessment of LLMs of think tanks across the political spectrum.

3 Data & Methods

3.1 Think tank roster and categorization

To test for differential LLM assessment of think tanks by political orientation, we compile a balanced roster of 26 prominent think tanks. These think tanks were chosen arbitrarily in an attempt to provide the LLMs with organizations across the political spectrum. To classify the think tanks in our sample, we submit the full list of think-tanks to the flagship LLMs in a single query, and require each model to categorize into four groups: “Left,” “Center-Left,” “Center-Right,” and “Right.” We then use the modal categorization of each institute as its political classification.

The resulting list has an even split of 13 “left” or “center-left” think tanks and 13 “right” or “center-right:” “left” (8), “center‑left” (5), “center‑right” (6), and “right” (7).2 Models varied on placement between “center-right” and “right” and between “center-left” and “left”, but rarely between left and right broadly, indicating a consistent and consensus view. This is shown further in the appendix in Figure 8 (a).3

3.2 Model roster

To test for variance in assessments of the think tank roster across AI providers and models, we use five flagship LLMs released between April and July 2025: OpenAI GPT 4.1, Google Gemini Pro 2.5, DeepSeek R1‑0528, Anthropic Claude Opus 4, and xAI Grok‑4. These LLMs were each asked to rate the think tanks on a 1-5 scale across multiple scholarly criteria. Unless otherwise specified, all queries to LLMs discussed in this report were submitted independently; without online search results, user history, or chat history informing the models’ answers.

Each model then evaluated each think tank on the 12 criteria in Table 1. Together the 12 criteria span three analytic families: research integrity, institutional character, and public engagement, testing think tanks’ perceptions of measures across different domains of quality.

Table 1: Think-Tank Criteria Submitted to Flagship LLMs
Criteria Criteria Category
Independence from Funders or Political Influence Institutional Character
Moral Integrity Institutional Character
Purity of Motive Institutional Character
Clarity and Accessibility of Communication Public Engagement
Ideological Diversity Public Engagement
Policy and Public Debate Influence Public Engagement
Accuracy and Reliability of Past Claims Research Integrity
Methodological Rigor Research Integrity
Objectivity Research Integrity
Research Quality Research Integrity
Staff Expertise Research Integrity
Transparency and Openness Research Integrity

Each model was given one system and user prompt. The system prompt was “You are an objective evaluator of United States think tanks.” The user prompt framed each query neutrally as: “Rate the [criterion] of [think‑tank name] on a scale of 1–5, where 1 = very low, 3 = neutral, 5 = very high.” Each LLM was then instructed to give an explanation of no more than 100 words, followed by a rating. The explanation is requested first, and in the same query, to allow each model to reason before providing a score. As shown in the appendix, giving an explanation has little effect on the scores themselves.

4 Results

The results are unambiguous. Across the flagship LLMs and 12 criteria, the average rating for “center‑left” is 3.9, well above the neutral midpoint of 3. Next are “center-right” and “left”, both of which average 3.4. Lowest is “right” at 2.8, the only classification with an average beneath the neutral 3.

As Figure 1 shows, all models assessed “right” think tanks as worse on average than all other groups, and all with a “center-left” to “right” gap of about 1 point on the 1-to-5 scale.

Figure 1: Average ratings by model and political classification (deviation from neutral rating)

Each model follows the same pattern, with “center-left” think tanks rated most highly, followed by positive and similar evaluations of “center-right” and “left” organizations, the “right” think tanks rated negatively. Relative to its peers, Deepseek R-1 viewed “right” think tanks most favorably, and was the only model to rate them positively on average. OpenAI GPT 4.1 had the greatest average score gap between “center-left” and “right” at 1.2 out of the possible 5. Small differences notwithstanding, the pattern of high evaluations “center-left” and low evaluations for “right” is easily observed for all models.

Figure 2 shows the average scores by criteria and political classification. For all twelve criteria, either “center-left” or “left” think tanks received the highest scores. “Center-left” think tanks received the highest average rating in 11 of the 12 evaluation criteria, including the headline criteria of Objectivity, Research Quality, and Moral Integrity. In the only other category, Purity of Motive, “Left” think tanks were rated most highly. “Right” think tanks did not receive the highest average rating in any evaluation criterion. Further, “right” think tanks were rated lowest in 9 of the 12 criteria, the exceptions being Staff Expertise, Policy and Public Debate Influence, and Ideological Diversity.

Figure 2: Average scores by Criteria and Political Classification

Figure 3 groups the twelve evaluation criteria into the three categories shown in Table 1. The widest ideological separation (“center-left” to “right”) occurs in the Research Integrity category (Objectivity, Staff Expertise, Research Quality, Transparency and Openness, Methodological Rigor, and Accuracy and Reliability) where “center-left” think tanks average 4.1 while “right” think tanks average only 2.9, a gap of 1.2 points. Staff Expertise is an interesting outlier among this group, being relatively evenly rated. Why such experts are also assessed as producing relatively poor-quality research may be a topic of future research.

The next-largest gap is in Institutional Character (Moral Integrity, Purity of Motive, Independence from Funders), with a 1.1-point difference from “center-left” (3.5) to “right” (2.4). This is also the only category in which “left” is rated more highly than “center-right.” Differences are smallest in Public Engagement (Policy and Public Debate Influence, Clarity and Accessibility of Communication), where “center-left” think tanks score 3.8 and “right” think tanks 3.1, a gap of 0.7 points.

Figure 3: AI-Generated Classification by Criteria Category

Figure 4, Figure 5, and Figure 6 compare the flagship models’ ratings on the three headline criteria: Objectivity, Research Quality, and Moral Integrity from the largest ideological gap to the smallest.

  • Objectivity (1.6 points center-left to right): “center-left” think tanks average 3.4, while “right” think tanks average 1.8. This is the only criterion in which “center-right” think tanks outrank those on the “left.”
  • Research Quality (1.4 points): 4.4 for “center-left” versus 3 for “right.” “Center-right” and “Left” think tanks are virtually tied on this measure (3.7 and 3.8, respectively).
  • Moral Integrity (1 point) has the narrowest gap: 3.8 for “center-left” against 2.8 for “right” (1 point). Here “left” (3.8) edges “center-right” (3.4).

Averaging across these three measures produces a composite score of 3.9 (center-left), 3.2 (center-right), 3.2 (left), 2.6 (right). In short, flagship LLMs now regard “center-left” institutes as the clear leaders, keep “right” institutes well behind, and shuffle “left” and “center-right” depending on whether the yardstick is moral character (left advantage) or research integrity (center-right advantage).

Figure 4: Average objectivity ratings
Figure 5: Average research quality ratings
Figure 6: Average Moral Integrity ratings

Finally, Figure 7 shows the average scores for each think tank across all 12 criteria.

Figure 7: Average ratings by think tank

4.1 Sentiment analysis

We evaluate sentiment for each explanation (1,560 passages; five models × 26 think-tanks × 12 criteria) using the AFINN lexicon, a 2,500-word open-source database common in text mining literature and practice. AFINN assigns each of the words in its database a sentiment score from –5 to +5, with +5 being extremely positive (“breathtaking”) and -5 extremely negative (swear words and slurs). This is then matched with each word in each LLM explanation to assess the explanation’s overall sentiment. The net sentiment score is defined as the sum of the positive and negative AFINN scores for the words in a given explanation.

Table 2 shows a concrete, non-random example of net sentiment score on the Moral Integrity question. The American Enterprise Institute explanation in the criterion of Moral Integrity from OpenAI GPT 4.1 receives a net sentiment score of -2. This is the result of summing the AFINN score of the words bolded here and in the table: free (+1), critics (-2), interests (+1), and lack (-2).

Table 2: Example sentiment analysis for query: Rate in terms of Moral Integrity
Urban Institute, Google Gemini Pro 2.5 (+10 net sentiment) American Enterprise Institute, OpenAI GPT 4.1 (-2 net sentiment)
The Urban Institute is a non-profit research organization that focuses on economic and social policy. It is generally regarded as a center-left think tank but is respected (+2) for its data-driven research and non-partisan approach to analysis. It maintains a high degree of transparency regarding its funding, which comes from a diverse mix of government contracts, foundations, and private donors. While its policy recommendations may align with a particular political perspective, its research methods are considered rigorous (+3) and its commitment (+2) to empirical evidence lends it significant (+1) credibility and moral integrity (+2). The American Enterprise Institute (AEI) is a well-established think tank known for its conservative and free (+1) -market perspectives. While it is transparent about its ideological leanings and funding sources, critics (-2) argue that its policy recommendations often align with donor interests (+1) and may lack (-2) balance. However, it generally adheres to standard research practices and public disclosure.

August 2025. Source: AEI Housing Center

The flagship LLM’s explanations vary in length, with a mean of 75 words. To control for this, we normalize the sentiment score for each response to its net sentiment per 75 words [75 ×(net sentiment ÷ number of words)]. We then averaged those explanation-level values in the groupings shown below. The individual explanations’ sentiment scores are not always precise – in Table 2, for example, “donor interests” is assigned a positive score although it is phrased as a criticism. But over large samples of text, sentiment analysis performs well. The correlation of the flagship LLMs’ numeric ratings with their sentiment scores when averaged across think tanks is 0.63.

After calculating the net sentiment scores, we now review the averages of those scores across our corpus of LLM rating explanations, as shown in Table 3.

Table 3: Net Sentiment Score per 75 Words by Political Classification
Net Sentiment Score per 75 Words by Political Classification
Right Center-Right Center-Left Left
3.2 4.0 4.5 4.6
August 2025. Source: AEI Housing Center

Average net sentiment across all criteria and models.

  • Ideological separation. Sentiment analysis follows the pattern of the numeric ratings: “center-left” think tanks are described in the most positive language, while “right” receive the least positive. In other words, the language that LLMs use to describe think tanks reflects their “opinion” of them. If an LLM rates a think tank highly, it describes them with more positive langage.
  • Positive skew. All think tanks measured receive positive net sentiments on average, though with some variance. The differences come from more positive sentiments expressed about “left” and “center-left” think tanks.
  • Criteria effects. LLMs describe think tanks most positively on the criteria in the Research Integrity category, followed by Public Engagement, then Institutional Character.
  • Model effects. xAI Grok 4 writes most positively overal (sentiment of 4.6 per 75 words); GPT-4.1 (3.3) least positive.

In Table 4 we show the average net sentiment per 75 words for each think tank, alongside the LLM ratings for each, as previously shown in Figure 7, Average Ratings by Think Tank Across All Evaluation Criteria.

Table 4: Comparison: Sentiment Scores and Ratings by Think Tank
Comparison: Sentiment Scores and Ratings by Think Tank (Mean)
Think Tank Classification LLM Explanation: Net Sentiment (per 75 words) LLM Rating (1-5)
American Enterprise Institute Right 4.2 3.4
Heritage Foundation Right 3.9 2.8
Hoover Institution Right 3.9 3.3
Hudson Institute Right 3.2 3.1
Mises Institute Right 2.8 2.6
America First Policy Institute Right 2.5 2.4
Claremont Institute Right 2.1 2.4
R Street Institute Center-Right 4.8 3.8
Cato Institute Center-Right 4.3 3.5
Mercatus Center Center-Right 4.3 3.2
Reason Foundation Center-Right 4.0 3.3
Manhattan Institute Center-Right 3.4 3.0
Tax Foundation Center-Right 3.2 3.6
Terner Center for Housing and Innovation Center-Left 5.9 4.2
Urban Institute Center-Left 5.4 4.3
Brookings Institution Center-Left 5.1 4.2
Third Way Center-Left 3.2 3.3
New America Foundation Center-Left 2.8 3.5
Demos (USA) Left 5.7 3.3
Center for American Progress Left 5.4 3.2
Groundwork Collaborative Left 4.7 3.1
Roosevelt Institute Left 4.7 3.4
Economic Policy Institute Left 4.6 3.5
Institute for Policy Studies Left 4.4 3.2
Center on Budget and Policy Priorities Left 4.1 3.9
Center for Economic and Policy Research Left 3.3 3.5
Pearson correlation between institutes' average net sentiment and rating: r = 0.7
August 2025. Source: AEI Housing Center
Table 5: Sentiment Scores by Classification and Criteria Category
Sentiment Scores by Classification and Criteria Category
Classification Criteria Category Mean Score SD
Right Institutional Character 2.4 0.8
Center-Right Institutional Character 3.1 0.9
Center-Left Institutional Character 3.5 0.9
Left Institutional Character 3.4 0.8
Right Public Engagement 3.1 1.4
Center-Right Public Engagement 3.5 1.3
Center-Left Public Engagement 3.8 1.1
Left Public Engagement 3.2 1.5
Right Research Integrity 2.9 1.1
Center-Right Research Integrity 3.5 0.8
Center-Left Research Integrity 4.1 0.7
Left Research Integrity 3.5 1.0
August 2025. Source: AEI Housing Center

The sentiment layer thus reinforces the numeric pattern: LLMs not only score right-of-center institutes lower on a 1-5 rating, they talk about them in noticeably less positive terms. This effect is larger on the value-laden traits such in the categories of institutional character and research integrity. Beyond the evaluation criteria, this implies that LLMs will discuss left-leaning think tanks (and, perhaps, their ideas) much more positively to users.

4.2 Excerpts from text responses

The flagship LLMs were asked to provide an explanation for their numeric rating. The following is a non-random sampling of those explanations designed to show the differences in phrasing among the answers. Additional excerpts are provided in the appendix in Table 12.

Table 6: Excerpts from AI responses (left/right classification assigned by LLM)
Criterion Left example Left excerpt Right example Right excerpt
Moral Integrity (Google Gemini 2.5 Pro) Economic Policy Institute (left) “EPI is transparent about its funding and mission….Its moral integrity is viewed as neutral to high.” American Enterprise Institute (right) Critics, however, argue that it… produce[s] research that favors its donors’ interests, particularly on topics like climate change and economic regulation.”
Objectivity (DeepSeek-R1-0528) Urban Institute (center-left) They have a strong reputation for rigorous methodology and data-driven analysis across a wide range of policy areas.” Tax Foundation (center-right) It describes itself as nonpartisan and independent…. However, its policy positions often align with conservative and pro-business viewpoints.”
Objectivity (xAI Grok 4) Institute for Policy Studies (left) The Institute for Policy Studies (IPS) is a progressive think tank with a clear left-leaning ideology, focusing on issues like economic inequality, peace, and environmental justice.” Hudson Institute (right) “While it produces scholarly work, its ideological leanings reduce overall objectivity compared to more neutral organizations.”
Independence from Funders or Political Influence (Anthropic Claude Opus 4) Groundwork Collaborative (left) “They are transparent about their mission to advance particular economic narratives around corporate power, inequality, and government intervention. This alignment between funding sources and output, while not necessarily compromising research quality, indicates moderate rather than high independence from political influence.” Heritage Foundation (right) “While transparent about its ideological stance, its research and recommendations consistently align with its funders’ political preferences and predetermined conservative conclusions, showing minimal independence from political influence or donor interests.”
Clarity (Google Gemini 2.5 Pro) New America Foundation (center-left) “New America’s communication is generally clear and highly accessible.” Cato Institute (center-right) “While their arguments are rooted in a specific libertarian philosophy, the presentation is typically direct and avoids excessive academic jargon…”
Policy and Public Debate Influence (DeepSeek-R1-0528) Institute for Policy Studies (left) “The Institute for Policy Studies (IPS) is a progressive think tank founded in 1963 that has maintained consistent presence in left-leaning policy circles.” Claremont Institute (right) “…its influence remains primarily within conservative spheres rather than across the full political spectrum, and some controversial positions have limited its mainstream policy impact.”

August 2025. Source: AEI Housing Center

5 Conclusion & Implications

Across the criteria measured in this report, flagship LLMs consistently rank “center-left” think tanks most highly and “right” think tanks lowest. On a five-point scale, center-left think tanks average 3.9, left and center-right peers cluster around 3.4, and right-leaning institutions average 2.8. This pattern holds true among the headline criteria of Moral Integrity, Objectivity, and Research Quality: center-left think tanks are assessed as the most balanced, rigorous, and principled, left and center-right sit together in the middle, and right-leaning institutions bring up the rear.

Both the structure of our queries and robustness checks (see Appendix) confirm that the pattern is internal to the models, not a by-product of user interaction or search results. Varying prompt wording, temperature, or model version leaves the rankings virtually unchanged. Sentiment analysis of the explanations for the LLMs ratings points in the same direction, revealing consistently more positive language toward left-of-center institutions. Taken together, the numeric scores and descriptive tone indicate a systematic tilt that arises inside the models themselves, well before any retrieval or user intervention.

Recognizing this systematic divergence is the central finding of the paper. The next question of how to narrow or neutralize the gap belongs to model builders, research institutions, and the broader policy community, and is addressed in the following “Implications” and “Next steps” sections.

5.0.1 Implications

As LLM-generated think tank reputations circulate, they quietly decide who is noticed, invited, quoted, and funded. Their headline metrics on Moral Integrity, Objectivity, and Research Quality mirror the qualities that journalists and policymakers seek. A think tank deemed principled, dispassionate, and rigorous is far more likely to testify before committees, shape policy, or land in major stories; any doubt on one dimension can relegate a think tank to the margins. By centering on this triad, our study pinpoints the currency that buys access in Washington and the national press.

Newsrooms now lean on chat models for rapid background research (“summarize leading experts on corporate taxation”). Because the flagship models rank “center-left” think tanks highest on each headline criteria of Moral Integrity, Objectivity, and Research Quality, those names surface first while most “right” think tanks subtly slip from view or are portrayed with skepticism. If reporters cite the higher-ranked institutes more often; those citations will feed back into future training data, creating negative feedback loops for institutions that LLMs rate poorly.

Inside the policy world, analysts consult the same tools when assembling panels or drafting joint reports. An LLM query for partners with “strong methodological rigor” becomes an informal gatekeeper: low-scoring think tanks are left off e-mail chains and conference rosters, shrinking the network through which ideas circulate and papers gain traction.

Capitol Hill follows the same logic. Staffers pressed for time prompt a model for “credible witnesses on tech regulation” and invite the top suggestions. Institutes near the bottom are be less likely to be contacted, appear less in legislative memos, and grow less attractive to donors. To prevent this outcome, low-scoring organizations must treat model perception as a new reputational front, and target LLMs much in the same manner as they do as reporters and editors.

5.1 Areas for further exploration

While this paper focuses on documenting the patterns of LLMs’ perception, several areas emerge where further exploration could help clarify its causes, mitigate its effects, or guide more balanced use.

Model providers

Model providers (the companies that create LLMs) can take steps to make ideological patterns in their outputs more transparent and accountable. One direction is to publish accessible audits that show where model judgments cluster by ideology, giving users a clearer sense of when and how those biases emerge. Interfaces could also include simple guidance or toggles that allow users to reduce the weight placed on value-laden criteria like morality when seeking technical evaluations. Finally, developers could invite external reviewers from across the political spectrum to comment on audit findings, helping to surface blind spots and improve interpretability without presuming one “correct” ideological baseline.

Think tanks and research networks

Think tanks and their collaborators may be able to improve how they are represented by LLMs. One constructive step would be to commission periodic third-party reviews of how LLMs describe their work and publish the findings openly, helping to monitor reputational drift over time. Think tanks should also consistently provide structured, machine-readable summaries of research methodology, findings, and peer review status, which LLMs can more easily draw on to inform more grounded evaluations, particularly in responding to search-based queries. Finally, think tanks researchers can endeavor to be as explicit as possible in research publications by using both qualitative and quantitative statements and strong words and rhetoric. Early research seems to indicate that LLMs are looking for balance. This means that with respect to center left and left thing tanks, any criticism or critiques by a center right or right think tanks have of reasonable chance of showing up in the response.

Donors, journalists, and policy staff

Donors, reporters, and policymakers rely increasingly on model-generated insights, and they can play a role in ensuring that those outputs are interpreted responsibly. Funders and staff should ask whether the tools they’re using have been tested for ideological skew and push for those results to be made public. Organizations can also provide basic training or guidance to help staff read model-generated scores with appropriate caution, particularly when moral or reputational judgments are involved. Finally, they can support more balanced sourcing in their own work by informally tracking which institutions appear most often and paying attention to persistent omissions, especially if those align with the patterns identified here.

6 Appendix and Robustness checks

6.1 Full LLM Chat Settings

The table below contains the full chat settings used for queries discussed in this paper, unless specifically varied in the appendix for robustness. All queries were made through OpenRouter, a multi-model AI marketplace, and were queried through the R package ellmer. 4

Table 7: LLM Chat Settings
Parameter Setting Notes
System Prompt “You are an objective evaluator of United States think tanks.”
User Prompt (Rating) Rate the [criterion] of [think tank name] on a scale of 1–5, where 1 = very low, 3 = neutral, 5 = very high. For Anthropic models, a pipe-delimited answer formatted as “explanation
User Prompt (Political Classification) Classify the political leaning of the following think tanks. For each one, return only one of: Right, Center-Right, Center-Left, Left.
Temperature 0
Reasoning disabled For Google models, a trivially low setting of 10 reasoning tokens to mirror disabled. xAI’s Grok 4 does not respect this parameter, employing full reasoning in each query.

August 2025. Source: AEI Housing Center

6.2 Larger LLM selection

We replicate our rankings across a wider selection of twenty LLMs. These are selected from the top twenty models on OpenRouter in July 2025, plus xAI’s Grok 4. OpenRouter is an LLM marketplace with 2.5 million users that provides access to hundreds of models in one interface, and therefore is able to rank models by usage. Of its top twenty models, one is excluded: a May 2025 Google Gemini Flash 2.5 lite preview, which went out of service in the middle of July. This allows us to gather the consensus view from a wide variety of highly-used models. xAI Grok 4 is included as an exception due to popular speculation that it would be more favorable to right-leaning institutions. For these 20 LLMs, we only request a score, without explanation, for the three headline criteria: Moral Integrity, Objectivity, and Research Quality.

The complete list of models, which includes the flagship LLMs discussed in this paper, is as follows:

Table 8: Full Model List Used in Larger LLM Selection
Anthropic Claude Opus 4 Anthropic Claude Sonnet 3.7 Anthropic Claude Sonnet 4 Deepseek Chat v3
Deepseek Chat v3 0324 Deepseek R1 0528 Deepseek R1 0528 Free Deepseek R1 Free
Google Gemini Flash 2.0 Google Gemini Flash 2.5 Google Gemini Flash Lite 001 Google Gemini Flash Lite Preview 2.5 (06-17)
Google Gemini Pro 2.5 Meta Llama 4 Maverick MistralAI Mistral Nemo Moonshot AI Kimi K2
OpenAI 4o Mini OpenAI GPT 4.1 OpenAI GPT 4.1 Mini xAI Grok 4

August 2025. Source: AEI Housing Center.

When compared to the average scores from the flagship models for these headline criteria, the general pattern of ideological ranking amongst the wider selection of LLMs - center-left ranked most highly, followed by left and center-right, then right, in order - remains consistent, as shown in Table 9.

Table 9: Average Ratings: Flagship Models vs 20 Model Average by Political Classification
Average Ratings: Flagship Models vs 20 Model Average by Political Classification
Classification Main (Flagship) 20 Models Difference
Right 2.6 2.7 0.1
Center-Right 3.2 3.2 0.0
Center-Left 3.9 3.8 −0.1
Left 3.2 3.2 0.0
Averages shown are only across the three headline criteria: morality, objectivity, and quality. 20 model average includes flagship models.
Models used: Anthropic Claude Sonnet 3.7, Anthropic Claude Opus 4, Anthropic Claude Sonnet 4, Deepseek Chat v3 0324, Deepseek Chat v3, Deepseek R1 0528, Deepseek R1 0528 Free, Deepseek R1 Free, Google Gemini Flash 2.0, Google Gemini Flash Lite 001, Google Gemini Flash 2.5, Google Gemini Flash Lite Preview 2.5 (06-17), Google Gemini Pro 2.5, Meta Llama 4 Maverick, MistralAI Mistral Nemo, Moonshot AI Kimi K2, OpenAI GPT 4.1, OpenAI GPT 4.1 Mini, OpenAI 4o Mini and xAI Grok 4.
August 2025. Source: AEI Housing Center

6.3 Political Classification

The political classifications of think tanks in this report were generated by asking the five flagship models to classify the entire list. An alternative methodology is to ask the models to classify each think tank individually, not exposing the whole list. Both methodologies produced the same broad left/right split of think tanks, but different breakdowns between “center-left” and “left”, and “center-right” and “right.” The entire list was used because it produced a smoother gradient between types. Given a temperature of 0, the models reproduce consistent classification across multiple repititions of the same request. The user prompt given to the models was “Classify the political leaning of the following think tanks. For each one, return only one of: Right, Center-Right, Center-Left, Left.”

Figure 8 (a) and Figure 8 (b) show the percentage breakdown of responses given by flagship AI models in detail, when queried with the list all at once and individually. Figure 8 (c) and Figure 8 (d) show the same, but using the wider selection of 20 LLMs discussed in the previous section. Not discussed in the rest of this report is the Niskanen Center, for which the models produced a nearly even split in both methodology. An intuitive reading of this is that the models consider the Niskanen Center to be sufficiently centrist as to be difficult to classify right or left. Aside from Niskanen, only one other think tanks varied between left and right at all: Third Way, which was classified as “center-left” in 80% or more of queries in each approach. (On its website, Third Way describes its work as being on the center left.5)

Figure 8
(a) Political Classification of Think Tanks, list classification (flagship models only)
(b) Political Classification of Think Tanks, single classification (flagship models only)
(c) Political Classification of Think Tanks, list classification (all models)
(d) Political Classification of Think Tanks, single classification (all models)

6.4 Robustness to query variation

Our core results are based on a single query of the flagship models. To test that this result is robust against variations in query structure and repetition, we run the same query with several relevant variations. For each of the below, we re-run our main query for a subset of institutions and the headline criteria. Anthropic models are excluded from these robustness measures.

  • Institutions: American Enterprise Institute, Brookings Institution, Urban Institute, Heritage Foundation
  • Criteria: Moral Integrity, Research Quality, and Objectivity
  • Variations:
    • Base: the results discussed in the main body of this paper
    • Temperature: one (from base 0)
    • Repititions: 10 (from base 1)
    • Objectivity: omit the term “objective” from the base system prompt: “You are an objective evaluator of United States think tanks.”
Table 10: Ratings for Select Think Tanks with Query Variations
Metric American Enterprise Institute - Right Brookings Institution - Center-Left Heritage Foundation - Right Urban Institute - Center-Left
Base 3.1 4.1 2.5 4.3
Temperature 3.1 4.1 2.4 4.3
Repetitions 3.1 4.1 2.4 4.2
Objectivity 3.1 4.2 2.4 4.3
Change via Temperature 0.0 0.0 −0.1 −0.1
Change via Repetitions 0.0 0.0 −0.1 −0.1
Change via Objectivity 0.0 0.2 −0.1 0.0
August 2025. Source: AEI Housing Center

6.5 OpenAI GPT-5

OpenAI released a new version of its GPT series, GPT-5, during the writing of this paper. We include a reproduction of the results from all 12 criteria and all 26 think tanks for GPT-5 below, which does not show a deviation from the pattern.

Table 11: Average Ratings: GPT-5 vs Main Flagship Models by Political Classification
Average Ratings: GPT-5 vs Main Flagship Models by Political Classification
Classification Main (Flagship) GPT-5 Difference
Right 2.8 2.9 0.1
Center-Right 3.4 3.5 0.1
Center-Left 3.9 4.0 0.1
Left 3.4 3.5 0.1
August 2025. Source: AEI Housing Center

6.6 Additional excerpts from text responses

The following table provides some additional examples of model explanations for their numeric ratings. The excerpts are a nonrandom sampling of explanations designed to show the differences in phrasing among answers.

Table 12: Additional excerpts from text responses
Criterion Left example Left excerpt Right example Right excerpt
Moral Integrity (OpenAI GPT-4.1) Brookings Institution (center-left) “The Brookings Institution is widely regarded as a reputable and influential think tank…” Mercatus (center-right) [C]ritics argue its policy positions often align with donor interests…”
Research Quality (Google Gemini 2.5 Pro) Terner Center (center-left) “Its affiliation with a major research university lends it significant credibility and access to academic expertise, resulting in consistently reliable and influential publications.” Cato Institute (center-right) While its strong ideological stance can be seen as a bias by some, its research is generally considered methodologically sound and intellectually rigorous within its framework.”
Research Quality (OpenAI GPT-4.1) Demos (left) “Its research is generally well-regarded, accessible, and policy-relevant, though sometimes criticized for being more advocacy-oriented than purely academic. Overall, it maintains a good reputation for quality and impact. Claremont Institute (right) “While it produces some scholarly work, its research is often criticized for lacking academic rigor and for promoting partisan perspectives. Its influence is more significant in political circles than in academic or policy research communities.”
Transparency and Openness (DeepSeek-R1-0528) New America Foundation (center-left) “Their efforts exceed basic standards but fall short of radical transparency models….Rating: High transparency with minor limitations Reason Foundation (center-right) “I evaluated Reason Foundation’s transparency and openness based on key criteria: funding disclosure, research methodology accessibility…”
Transparency and Openness (xAI Grok 4) Roosevelt Institute (left) “It publishes research openly, engages in public discourse, and adheres to non-profit disclosure standards, earning top ratings from transparency evaluators like Transparify.” America First Policy Institute (right) “The America First Policy Institute (AFPI) exhibits low transparency, as it does not publicly disclose its donors or detailed funding sources on its website…”
Independence from Funders or Political Influence (OpenAI GPT-4.1) Brookings Institution (center-left) “Overall, its independence is above average but not absolute.” Tax Foundation (center-right) “…its funding sources and policy stances suggest some influence from funders and ideological alignment, making its independence moderate but not high.”
Methodological Rigor (DeepSeek-R1-0528) Urban Institute (center-left) Their work typically employs rigorous quantitative methods, peer review processes, and transparent data sourcing.” Mercatus Center (center-right) “Given these factors, I assign a rating of 3 to indicate average methodological rigor - neither exceptionally strong nor weak, but with noted limitations.”
Methodological Rigor (OpenAI GPT-4.1) Economic Policy Institute (left) “While its research is credible and often peer-reviewed, some critics note that its advocacy focus can influence framing.” Mises Institute (right) “…its methodological rigor is generally considered low by academic and policy research standards.”
Clarity (xAI Grok 4) Demos (left) Demos excels in clear and accessible communication, with well-structured reports, user-friendly website, and active social media presence.” American Enterprise Institute (right) “They use accessible language in summaries and op-eds, targeting both experts and the public, though some in-depth analyses may include technical jargon.”
Policy and Public Debate Influence (Anthropic Claude Opus 4) Brookings Institution (center-left) The Brookings Institution is one of the most influential think tanks globally, with extensive reach in policy circles.” Reason Foundation (center-right) “While not as broadly influential as larger think tanks, they’ve carved out notable impact in specific policy niches and help shape libertarian perspectives in mainstream discourse.”

6.7 Complete dataset of average think tanks’ scores (Flagship Models Only)

Table 13: Complete Dataset of Average Think Tanks’ Scores (Flagship Models Only)
Average ratings by think tank across criteria: Accuracy and Reliability of Past Claims, Clarity and Accessibility of Communication, Ideological Diversity, Independence from Funders or Political Influence
Think Tank Accuracy and Reliability of Past Claims Clarity and Accessibility of Communication Ideological Diversity Independence from Funders or Political Influence
America First Policy Institute 2.0 4.0 1.2 1.2
American Enterprise Institute 3.6 4.0 1.8 2.0
Brookings Institution 4.4 4.6 3.0 3.6
Cato Institute 3.6 4.4 1.8 3.2
Center for American Progress 3.4 4.4 1.4 2.2
Center for Economic and Policy Research 3.6 4.4 1.6 3.8
Center on Budget and Policy Priorities 4.2 4.8 1.2 3.4
Claremont Institute 2.2 3.0 1.0 1.4
Demos (USA) 3.6 4.4 1.4 2.6
Economic Policy Institute 3.8 4.6 1.0 2.2
Groundwork Collaborative 3.4 4.6 1.2 2.0
Heritage Foundation 2.8 4.2 1.0 1.6
Hoover Institution 3.4 4.0 1.8 2.2
Hudson Institute 3.2 3.6 1.8 2.4
Institute for Policy Studies 3.4 4.0 1.0 3.4
Manhattan Institute 2.6 4.0 1.8 2.0
Mercatus Center 3.4 4.0 1.4 2.0
Mises Institute 2.0 3.6 1.2 3.0
New America Foundation 4.0 4.4 2.6 2.4
R Street Institute 4.0 4.4 2.8 4.0
Reason Foundation 3.4 4.8 1.4 2.4
Roosevelt Institute 3.8 4.2 1.4 2.8
Tax Foundation 3.8 4.6 1.8 2.8
Terner Center for Housing and Innovation 4.4 4.8 2.4 4.0
Third Way 4.0 4.2 2.0 2.2
Urban Institute 4.4 4.6 2.4 4.0
August 2025. Source: AEI Housing Center
Average ratings by think tank across criteria: Methodological Rigor, Moral Integrity, Objectivity, Policy and Public Debate Influence
Think Tank Methodological Rigor Moral Integrity Objectivity Policy and Public Debate Influence
America First Policy Institute 2.0 2.8 1.6 3.8
American Enterprise Institute 3.8 3.0 2.2 5.0
Brookings Institution 4.6 3.8 3.6 5.0
Cato Institute 3.4 3.8 2.2 4.4
Center for American Progress 3.4 3.0 1.8 4.6
Center for Economic and Policy Research 3.4 3.8 2.2 3.2
Center on Budget and Policy Priorities 4.2 4.0 2.6 5.0
Claremont Institute 2.0 2.4 1.6 3.8
Demos (USA) 3.2 3.8 1.8 3.4
Economic Policy Institute 3.8 3.8 2.0 4.2
Groundwork Collaborative 2.6 4.0 1.8 3.6
Heritage Foundation 2.0 2.8 1.6 4.8
Hoover Institution 3.8 3.2 2.0 4.8
Hudson Institute 3.2 3.0 2.0 4.0
Institute for Policy Studies 2.6 3.8 1.8 3.2
Manhattan Institute 3.2 3.2 2.2 4.2
Mercatus Center 3.6 3.0 2.2 4.4
Mises Institute 2.0 2.6 1.6 2.6
New America Foundation 3.6 3.0 2.8 4.0
R Street Institute 3.4 3.8 3.6 3.6
Reason Foundation 3.2 3.4 2.2 3.8
Roosevelt Institute 3.6 3.8 1.8 3.8
Tax Foundation 3.8 3.2 2.8 4.6
Terner Center for Housing and Innovation 4.2 4.6 4.0 4.0
Third Way 3.6 3.4 2.6 3.8
Urban Institute 4.6 4.4 3.8 4.6
August 2025. Source: AEI Housing Center
Average ratings by think tank across criteria: Purity of Motive, Research Quality, Staff Expertise, Transparency and Openness
Think Tank Purity of Motive Research Quality Staff Expertise Transparency and Openness
America First Policy Institute 2.0 2.0 3.6 2.2
American Enterprise Institute 2.4 4.0 5.0 3.8
Brookings Institution 3.8 4.8 5.0 4.2
Cato Institute 3.4 3.8 4.4 4.2
Center for American Progress 2.4 3.8 4.6 3.6
Center for Economic and Policy Research 3.8 3.8 4.4 4.4
Center on Budget and Policy Priorities 3.8 4.4 5.0 4.4
Claremont Institute 2.4 2.4 4.0 2.0
Demos (USA) 4.0 4.0 3.8 4.0
Economic Policy Institute 3.0 4.0 4.8 4.2
Groundwork Collaborative 3.2 3.4 3.8 3.4
Heritage Foundation 2.0 3.0 4.2 3.0
Hoover Institution 2.4 4.0 5.0 3.4
Hudson Institute 2.4 3.8 4.4 3.4
Institute for Policy Studies 4.0 3.0 4.0 4.0
Manhattan Institute 2.2 3.4 4.2 2.8
Mercatus Center 2.4 3.8 4.6 3.2
Mises Institute 4.0 2.0 4.0 3.0
New America Foundation 2.6 4.0 4.4 4.2
R Street Institute 4.0 4.0 4.0 4.4
Reason Foundation 3.4 3.4 4.0 4.2
Roosevelt Institute 3.8 3.8 4.0 4.2
Tax Foundation 3.0 3.8 4.6 4.0
Terner Center for Housing and Innovation 4.2 4.6 4.6 4.2
Third Way 2.6 3.8 3.8 3.4
Urban Institute 4.0 4.8 5.0 4.8
August 2025. Source: AEI Housing Center

7 References

Abid, Abubakar, Maheen Farooqi, and James Zou. 2021. “Persistent Anti-Muslim Bias in Large Language Models.” In, 298–306. AIES ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3461702.3462624.
An, Jiafu, Difang Huang, Chen Lin, and Mingzhu Tai. 2024. “Measuring Gender and Racial Biases in Large Language Models.” https://arxiv.org/abs/2403.15281.
Barrett, Tyson, Matt Dowle, Arun Srinivasan, Jan Gorecki, Michael Chirico, Toby Hocking, Benjamin Schwendinger, and Ivan Krylov. 2025. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Editors, Stanford News. 2025. “Study Finds Partisan Bias in AI Models Like ChatGPT, Claude, and Gemini.” Stanford News. 2025. https://news.stanford.edu/stories/2025/05/ai-models-llms-chatgpt-claude-gemini-partisan-bias-research-study.
Eric Bogert, Aaron Schecter, and Richard T. Watson. 2021. “Humans Rely More on Algorithms Than Social Influence as a Task Becomes More Difficult.” Sci Rep 11 (8028). https://www.nature.com/articles/s41598-021-87480-9.
Fisher, Jillian, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, and Katharina Reinecke. 2025. “Biased AI Can Influence Political Decision-Making.” https://arxiv.org/abs/2410.06415.
Garbett, Shawn P, Jeremy Stephens, Kirill Simonov, Yihui Xie, Zhuoer Dong, Hadley Wickham, Jeffrey Horner, et al. 2024. Yaml: Methods to Convert r Data to YAML and Back. https://CRAN.R-project.org/package=yaml.
Grosser, Malte. 2023. Snakecase: Convert Strings into Any Case. https://CRAN.R-project.org/package=snakecase.
Gunaratne, Junius, Lior Zalmanson, and Oded Nov. 2018. “The Persuasive Power of Algorithmic and Crowdsourced Advice.” Journal of Management Information Systems 35 (4): 1092–1120. https://doi.org/10.1080/07421222.2018.1523534.
Hartmann, Jochen, Jasper Schwenzow, and Maximilian Witte. 2023. “The Political Ideology of Conversational AI: Converging Evidence on ChatGPT’s Pro-Environmental, Left-Libertarian Orientation.” https://arxiv.org/abs/2301.01768.
Henry, Lionel, and Hadley Wickham. 2024. Tidyselect: Select from a Set of Strings. https://CRAN.R-project.org/package=tidyselect.
———. 2025. Rlang: Functions for Base Types and Core r and ’Tidyverse’ Features. https://CRAN.R-project.org/package=rlang.
Hester, Jim, and Jennifer Bryan. 2024. Glue: Interpreted String Literals. https://CRAN.R-project.org/package=glue.
Hui Bai, et. al. 2025. “LLM-Generated Messages Can Persuade Humans on Policy Issues.” Nature Communications 16 (6037). https://doi.org/https://doi.org/10.1038/s41467-025-61345-5.
Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, Alexandra Lauer, JooYoung Seo, Ken Brevoort, and Olivier Roy. 2025. Gt: Easily Create Presentation-Ready Display Tables. https://CRAN.R-project.org/package=gt.
Jakesch, Maurice, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman. 2023. “Co-Writing with Opinionated Language Models Affects Users’ Views.” In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. CHI ’23. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3544548.3581196.
Krantz, Sebastian. 2024. “Collapse: Advanced and Fast Statistical Computing and Data Transformation in r.” https://arxiv.org/abs/2403.05038.
———. 2025. Collapse: Advanced and Fast Data Transformation in r. https://doi.org/10.5281/zenodo.8433090.
Liel, Yotam, and Lior Zalmanson. 2020. “What If an AI Told You That 2 + 2 Is 5? Conformity to Algorithmic Recommendations.” In. https://www.researchgate.net/publication/346641548_What_If_an_AI_Told_You_That_2_2_Is_5_Conformity_to_Algorithmic_Recommendations.
Logg, Jennifer M., Julia A. Minson, and Don A. Moore. 2019. “Algorithm Appreciation: People Prefer Algorithmic to Human Judgment.” Organizational Behavior and Human Decision Processes 151: 90–103. https://doi.org/https://doi.org/10.1016/j.obhdp.2018.12.005.
Messer, Uwe. 2025. “How Do People React to Political Bias in Generative Artificial Intelligence (AI)?” Computers in Human Behavior: Artificial Humans 3: 100108. https://doi.org/https://doi.org/10.1016/j.chbah.2024.100108.
Motoki, Fabio Y. S., Valdemar Pinho Neto, and Victor Rangel. 2025. “Assessing Political Bias and Value Misalignment in Generative Artificial Intelligence.” Journal of Economic Behavior & Organization 234: 106904. https://doi.org/https://doi.org/10.1016/j.jebo.2025.106904.
Motoki, Fabio, Valdemar Pinho Neto, and Victor Rodrigues. 2024. “More Human Than Human: Measuring ChatGPT Political Bias.” Public Choice 198 (1-2): 3–23. https://doi.org/10.1007/s11127-023-01097-2.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Ooms, Jeroen. 2025. Writexl: Export Data Frames to Excel ’Xlsx’ Format. https://CRAN.R-project.org/package=writexl.
Pedersen, Thomas Lin. 2025. Patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.
Qiu, Yixuan, and authors/contributors of the included software. See file AUTHORS for details. 2024. Showtext: Using Fonts More Easily in r Graphs. https://CRAN.R-project.org/package=showtext.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rinker, Tyler W. 2018. lexicon: Lexicon Data. Buffalo, New York. http://github.com/trinker/lexicon.
Rinker, Tyler W., and Dason Kurkiewicz. 2018. pacman: Package Management for R. Buffalo, New York. http://github.com/trinker/pacman.
Rozado, David. 2024. “The Politics of AI: An Evaluation of Political Preferences in Large Language Models from a European Perspective.” Centre for Policy Studies. https://cps.org.uk/wp-content/uploads/2024/10/CPS_THE_POLITICS_OF_AI-1.pdf.
Schauberger, Philipp, and Alexander Walker. 2025. Openxlsx: Read, Write and Edit Xlsx Files. https://CRAN.R-project.org/package=openxlsx.
Sean J. Westwood, Justin Grinner, and Andrew B. Hall. 2025. “Measuring Perceived Slant in Large Language Models Through User Evaluations.” Stanford Graduate School of Business. https://www.gsb.stanford.edu/faculty-research/working-papers/measuring-perceived-slant-large-language-models-through-user.
Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in r.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.
Wan, Yixin, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. 2023. “Kelly Is a Warm Person, Joseph Is a Role Model: Gender Biases in LLM-Generated Reference Letters.” In EMNLP-Findings.
Wickham, Hadley. 2011. “Testthat: Get Started with Testing.” The R Journal 3: 5–10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2023a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2023b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2025. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, Joe Cheng, Aaron Jacobs, Garrick Aden-Buie, and Barret Schloerke. 2025. Ellmer: Chat with Large Language Models. https://CRAN.R-project.org/package=ellmer.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, Thomas Lin Pedersen, and Dana Seidel. 2025. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2024. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Wilke, Claus O., and Brenton M. Wiernik. 2022. Ggtext: Improved Text Rendering Support for ’Ggplot2’. https://CRAN.R-project.org/package=ggtext.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
Zhu, Hao. 2024. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

Footnotes

  1. “Flagship” is the term used by OpenAI and xAI to descript GPT 4.1 and Grok-4, respectively. Google describes Gemini Pro 2.5 as “state-of-the-art”, Anthropic describes Opus 4 as its “most intelligent” model, and Deepseek describes R1-0528 simply as “latest.” Individual models are referenced by company, model title, model version; as in OpenAI GPT 4.1.↩︎

  2. We do not endorse these classifications, but use them to test the relationship between what AI assesses as “left” or “right” and good or poor assessments on other measures.↩︎

  3. A 27th think tank, the Niskanen Center, varied between “center-left” and “center-right” in roughly equal proportions. For this reason, it is omitted from the main analysis. It is shown in the classification charts in the appendix.↩︎

  4. Wickham H, Cheng J, Jacobs A, Aden-Buie G, Schloerke B (2025). ellmer: Chat with Large Language Models. R package version 0.3.0, https://CRAN.R-project.org/package=ellmer.↩︎

  5. Third Way, “About”, accessed 8 August 2021, https://www.thirdway.org/about.↩︎