July 2, 2026
July Newsletter#
We skipped June; we've been busy. Here's everything since May.
In this issue:
- What We Led — MIRA in Ireland, core-satellite in New York
- What We Built — A Resilient Data Future, GREP, Lettuce Compute, SciOS Compute, SciOS Graph, and the MCP ecosystem into PRSM
- What We Wrote — The Core-Satellite Model
- Where We're Going — DWeb Camp July 07–13, a DOE open source keynote August 4
- What We've Seen
- IOSP 2026 — Leiden, October 12–15
▶What We Led
MIRA 2026 Workshop
MIRA is a shared schema for research artifacts. Questions, claims, evidence, sources, and the relationships between them get one common structure that any platform can inherit, rather than each tool maintaining its own. In June we ran a 23-person MIRA workshop with Matt Akamatsu and Discourse Graphs in Ireland. The output (including the landing page) is being uploaded to GitHub as each item is buttoned up. The schema itself is there, alongside an extractor that turns papers into MIRA graphs, an extension that adds MIRA directives to MyST Markdown, and prototypes for exchanging results and requests between labs and for tracing the impact of scientific investment.
The schema is already in use twice more in this issue. GREP is testing MIRA object extraction from any PDF, and SciOS Graph runs collaborative workshops on it; both are under What We Built.
The continuing work will be presented at IOSP 2026. Expect a trickle of updates over the summer. Get involved by joining the Modular Research lab.
Sustainable Open Source Funding Infrastructure: A Core & Satellite Model for OSS Funding in the Age of AI
We also ran a 23-person workshop in New York City around the core-satellite model, our commons framework for open source and open science (more under What We Wrote). Yes, also exactly 23 people; we didn't plan that. The group is producing at least one pilot core by IOSP 2026. Get involved by joining the Funding Open lab.
▶What We Built
A Resilient Data Future — graph.scios.tech
Our Resilient Data Futures narrative argues that research data loss is architectural, and that the fix is what it calls Tier 3 infrastructure, protocol-level distribution where redundancy arises as a byproduct of use instead of depending on any single organization. graph.scios.tech is a working implementation. Scientific records live on IPFS under content identifiers (CIDs), signed by their authors' decentralized identifiers (DIDs), and every link carries both a location and a content hash, so references don't rot and attribution can't be stripped. The whole stack runs on four small servers for $24 a month, and is easily replicated and scaled. Reach out if you'd like to help scale it.
We're expanding it in a workshop at IOSP 2026. Get involved by joining the Resilient Data Futures lab.
GREP — The Great Research Extraction Project
GREP ports the existing literature into frontier infrastructure. In production today, it extracts software mentions from any domain PDF as objects and relationships, and classifies how each piece of software figured in the work (used, created, deposited, or merely mentioned). A voting ensemble of five DeBERTa-based models does the extraction at 85% F1 on our benchmark, and 92% on span identification across a fifteen-domain test set. None of the five is sacred; each slot in the ensemble takes a replacement the day something better exists. The whole thing needs about 32 GB of memory and 45 GB of disk, no GPU, which is what lets volunteer hardware do the work (see Lettuce Compute, below). Expect a less heavy (and slightly less accurate) version in the coming weeks.
We're actively testing extensions of GREP into open science monitoring and MIRA object extraction. The monitoring work currently detects data availability statements and then checks whether data an author claims is open actually resolves; across a corpus of roughly 5,700 papers, 44% carried a statement, 26% claimed the data was openly available, and 16% pointed at data we could confirm was public, matching the claims made in our Resilient Data Futures narrative. The MIRA work turns any PDF into MIRA objects, which you can try at mira-extraction.vercel.app.
Get involved by joining the Modular Research lab.
Lettuce Compute — distributed computing, entering beta
Lettuce Compute is a permissionless distributed computing infrastructure for science. Researchers run a head server and define computations; volunteers run a small client, attach to any head they choose, select a leaf, and crunch on work units. Results are validated through redundancy, and contribution credit is cryptographically signed. There is no approval step and no whitelist; a volunteer generates a key and attaches. The scheduler matches work to machines that can actually run it, whether the task ships as a native binary, a container, or WebAssembly.
Well over 6,000 exaflops of processing power sits idle in the world's existing hardware. Lettuce Compute is how we connect it to scientific work, and we consider it core infrastructure for a reimagined scientific compact. It has spent the last month in alpha with several critical testers (to whom we are eternally grateful); beta opens this month.
Get involved by joining any SciOS Lab.
SciOS Compute — compute.scios.tech
SciOS Compute is Lettuce Compute with the setup done for you. Describe your computation and we stand up the project in under an hour, with distribution, validation, and volunteer onboarding handled. GREP already runs on it. It launches in earnest during the Lettuce Compute beta.
Get involved by joining the Resilient Data Futures lab.
SciOS Graph — workshop.scios.tech
SciOS Graph is a collaborative tool for building shared understanding, running on MIRA and open for anyone to use. It comes in two modes. Mini-MIRA condenses the schema to four node types so a room of newcomers can start thinking in graphs immediately; Full MIRA carries the complete schema for researchers who want the whole medium. We built it for live workshops, where participants add questions, thoughts, and connections from their own devices while a shared projector view updates in real time.
Get involved by joining the Modular Research lab.
PRSM — the MCP ecosystem in one graph
PRSM now crawls the MCP registries and assembles them into a single deduplicated graph. Seven registries feed it (the Official registry, Smithery, Glama, PulseMCP, mcp.so, Nerq, and BioContextAI for biomedicine). As of this writing it holds over 58,000 servers, 74,000 tools, and 26,000 confidence-scored "wrap" edges, each one recording that a specific MCP server lets an AI use a specific dataset, service, or piece of software.
The wrap edges make reverse lookup possible. Pick a dataset and ask what an AI can plug into today. NASA's open APIs are wrapped by 1,186 servers; SQLite by 735, PostgreSQL by 559, arXiv by 198, PubMed's E-utilities by 126, Crossref by 49, OpenAlex by 48.
Run the same question across the NumFOCUS-sponsored projects and the answer gets thin. pandas has 9 servers, SymPy 9, Matplotlib 8, NumPy 6, scikit-learn 2, SciPy 1, and Astropy, xarray, PyMC, seaborn, and GeoPandas sit at zero. (These are confirmed wraps, where the server declares the package as a dependency and names it, so read them as floors; many wrapper repositories haven't been analyzed yet.) The zeros are widely used scientific tools that nobody has wrapped in an MCP server. If you're looking for a way to help research software, that is a very good place to start.
The graph reaches specialist registries too. It carries 62 biomedical servers from BioContextAI (UniProt, Ensembl, PubChem, Open Targets, single-cell tools) and about a dozen Bioconductor community servers pulled straight from GitHub. Know a registry we're missing? There's a suggest-a-source form on the site.
All of it runs on PRSM's existing data layer, the graph that already links publications, repositories, packages, dependencies (declared and hidden), and contributors. Explore it, no login required: prsm.network/demo/mcp.
Get involved by joining the VOWELS lab.
▶What We Wrote
The Core-Satellite Model
We published The Core-Satellite Model, a commons framework for open source and open science built on human judgment. A core is a group of people who scope themselves within a domain and wrap around its artifacts. They decide what is canonical, hold the knowledge agents cannot read, and direct the agents that do the work. Satellites are the experimental artifacts that orbit them, independent but discoverable. As AI takes over more of the execution of science, expert judgment becomes the scarce resource, and cores are how a commons organizes it.
The New York workshop above is the first group putting the model into practice.
▶Where We're Going
DWeb Camp. Ellie is organizing the Anti-Authoritarian Stack Tent, and Jon is hosting two sessions. Friday, July 10 at 9:30am, A Seat at the Table, on the Open Source Endowment, a live experiment in community-owned funding for open infrastructure. Saturday, July 11 from 9:30 to 11:00, Break This Stack, a working session that decomposes a real paper into a sovereign, composable scientific record on ATProto and IPFS, then invites the room to break it at five identified failure points.
The 2nd Open Source Software Summit, August 4, Albuquerque. The U.S. DOE Integrated Energy Systems Office is gathering the open source projects it sponsors for a day on contributor engagement, maintaining open source in the age of AI coding tools, and collaboration among projects. Jon is giving a keynote. Agenda and details.
▶What We've Seen
Ellie was in Vancouver this spring for AtmosphereConf, where a workshop explored AT Protocol, the protocol underneath Bluesky, as a home for scholarly work; the keynote described research as a continuous, federated, computable graph of knowledge, and the demos ran discourse graphs and contribution attribution over decentralized identity. The Continuous Science Foundation's working group on modular peer review has met monthly all spring, asking how review and credit can attach to data, methods, code, and figures independently. The Knowledge Graph Conference drew more than a thousand people to Cornell Tech in May, including a symposium track on evidence graphs derived from the literature. People are converging.
Germany's DFG opened a funding line to safeguard endangered data repositories, citing roughly 200 repository closures over the past 25 years, more than half since 2018. A study in Nature Genetics traced the single points of failure in centralized biomedical databases (cyberattacks, funding cuts, political disruption) and held up ELIXIR, the 25-node federation behind Europe's life science data, as the working alternative. EOSC, Europe's open science cloud, added 14 nodes in April and now federates 28.
ICLR, one of the big machine learning conferences, collected just over 76,000 peer reviews for its 2026 cycle, and the AI-text detection firm Pangram Labs flagged about a fifth of them as fully AI-generated. Kubernetes shipped contributor policy in June that requires AI use to be disclosed, bars AI as a co-author, and promises reviewers they are talking to humans. Jazzband, the collective that maintained 84 Python packages serving 150 million monthly downloads, shut down in March; its founder named AI-generated spam contributions among the causes. In every one of these stories, the scarce resource is a person with the judgment to evaluate the work.
The UN held Open Source Week in New York in June, 2,600 participants from more than 120 countries, and UNESCO launched its Open Science Platform there, built on CERN's open repository framework InvenioRDM. We were there, and the appetite for radical change is even stronger in person than it reads in the announcements. Renaissance Philanthropy opened a $20 million fund for open source in the life sciences. Germany's Sovereign Tech Agency began paying open source maintainers a stipend to represent their projects in standards bodies. Wiley booked $49 million in AI licensing revenue this fiscal year, and arXiv and Semantic Scholar both gained MCP servers. Publishers and protocol builders are working out machine access to the scientific record right now, one licensing contract at a time on one path, one open server at a time on the other.
▶IOSP 2026 — Leiden, October 12–15
IOSP 2026 runs October 12, 13, and 15 at the Poortgebouw, University of Leiden, with a field trip to the National Open Science Festival in Delft on the 14th. Most of what's in this issue lands in Leiden. MIRA and the core-satellite pilots present there, the resilient-data stack expands in an on-site workshop, and GREP, Lettuce, and PRSM will be on the floor along with many other open science tools developed from the ecosystem!
Register interest (subscribers get priority if we're oversubscribed again), submit to the showcase, or sponsor. IOSP is free to attend, and every sponsorship dollar goes to travel grants that get people there.
That's it for July. See you at DWeb.
— Jon & Ellie, SciOS