AI Drug Discovery Breakthrough

● AI Drug Discovery Breakthrough

From plant genomics to antibody and drug delivery… How new drug AI explores the “unknown world”

The key points to pick up in today’s post are mainly three.

1) “AI that gets closer to the answer,” such as protein structure prediction (the AlphaFold family), becomes the starting point for drug development,

2) and then it extends to drug delivery (drug delivery)·antibody design (molecular-level binding) so that “success probability is engineered,”

3) and when you run biobig data + a prediction engine + an experiment feedback loop at a scale of 40 billion records from plant data, it creates a path for exploring even unknown areas.

Below, I’ll restructure “how drug AI actually works” step by step while organizing the original interview flow in a news-style format.

1) The “probability game” of drug development: Why AI came in

Drug development isn’t simply “finding the right substance.” It’s a continuous chain of probabilities intertwined with proteins, compounds, interactions, and delivery pathways.

If you summarize it like a news headline in one line:

AI entered drug development not as a tool for solving a single ‘one-and-only’ answer problem like in Go, but as a tool that expands candidates (exploration), narrows probabilities (convergence), and then verifies them through experiments.

The world of proteins/compounds: an unknown space that humans can’t directly see
100% accuracy is impossible: prediction increases the likelihood of being correct, but it’s not a guarantee of certainty
Nevertheless, a “computable engine” exists: an approach like AlphaGo—learning patterns and converging—is possible

Here, I’ll also connect the core SEO keywords from the article naturally.

Drug AI that tackles unknown areas, genomics-based prediction, protein structure prediction, biobig data—AI drug development ultimately moves as one connected whole.

2) The starting point of “predictions that land”: Protein structure prediction (AlphaFold family)

The core that repeats in the interview is this.

To design antibodies, proteins, and binding, you must know the ‘structure’

Proteins connect their function to their 3D structure (the folded shape)
By predicting structures like AlphaFold, you raise the probability of “whether a binding-capable form is possible”
But there’s no guarantee of “100% correctness”

So, protein structure prediction serves as the initial bridge of drug AI.

But it doesn’t end there; the next steps become even more complex.

3) The next chapter of prediction: Antibody design + PPI (protein-protein interactions)

A particularly interesting part in the original source is where it says that “antibody design is also possible with AI.”

Translated into this flow, it goes like this.

First determine the structure of the antigen (a target protein present on the surface of cancer cells, etc.)
Design the antibody (a customized binding partner) that will bind to that antigen
Here, binding isn’t just simple contact; the key is the structure·pocket of interactions like PPI

The point here is that even while separating the immune system as “separate,” the molecular-level mechanism of antigen–antibody response is connected to drug delivery/targeting.

To summarize:

Structure prediction → increased binding likelihood → design that binds better to the target.

4) The most realistic challenge in drug development: Drug delivery (delivery system design)

The original source strongly emphasizes that it’s not just about finding the drug—you also have to design the way to send it into the body.

A one-line news-style summary:

The success or failure of AI drug development depends on how much you can increase the ‘probability of reaching the target.’

Oral drugs: passage through the gastrointestinal tract (degradation/acid/barriers) + entry into the bloodstream + reaching the desired location
Molecular size constraints: if it’s too large, it’s disadvantaged in overcoming the stomach wall and absorption processes
No 100% guarantee: therefore, “delivery systems” must be designed

Representative examples mentioned in the original source include:

Capsule/lipid membranes: designed to withstand stomach acid and burst in the intestine
Lipid-structured membranes: optimize delivery conditions such as membrane structure, thickness, and stability
Targeting anticancer drugs: using a sensor that recognizes a special protein on the surface of cancer cells without touching normal cells
Duration of action: as it’s hard to inject insulin frequently, design the release/transport conveyance

This part is important because even if you get “target binding” right, if delivery fails, the effect disappears.

In other words, drug AI is evolving toward seeing both binding (target) and delivery (pathway) together.

5) The trap of error accumulation: Why problems arise when you stack multiple prediction engines

Here, the interview is quite candid.

It explains that “prediction engines help, but if you chain several engines together, errors increase.”

Error rate of structure prediction like AlphaFold (e.g., assuming 5%)
Error rate again in the next-step binding prediction (e.g., assuming 7%)
If these two stack linearly, the overall error rate can surge

That’s why, academically speaking, there’s a tendency not to stack multiple stages too deeply by default.

Even so, from a “big-picture” perspective, other strategies become possible.

Focus on the ‘lines’ where endpoints (answer/effect) are already known
Use prediction to explore the middle steps
Use experiments to quickly validate only “meaningful pathways”

In simple terms:

Rather than turning everything into a precise deterministic model, quickly narrow down convincing candidate pathways and confirm them through experiments.

6) Plant genomics prediction engine: An approach to extract “unknown compounds” from genomes

This is the most distinctive part in the original source.

While most drug AI focuses on protein/molecule prediction, Infoboss puts its effort into plant genomics → deriving candidates for useful compounds.

Core idea: since plants have a “blueprint” (genome) that creates compounds, extract genomic segments related to substance synthesis and then make predictions.

First engine: plant genome input → list candidates of “which useful compounds” could be made
Limitations of existing screening
- HPC needs a target goal: “which compounds to detect” first
- LCMS can do exhaustive screening, but it’s hard to pinpoint accurately, so errors remain
Therefore, the goal is to accurately create a ‘list of unknown substances’

If you rephrase the terminology simply here:

It’s not that the plant itself is good for hypertension; it’s that the compounds inside the plant are what have the effect.

So the starting point is: “use a prediction engine to first generate a list of compounds.”

7) Second engine: Narrow compound candidates again by “functional potential”

Once candidate compounds are generated, it’s not the end.

The next step is to again predict/filter whether that compound is truly likely to be useful.

TV search + biobig data to confirm known associations
Structure-based functional prediction engine to score specific functional possibilities
Example: among all 400, only 1 to a few have potential functions related to hypertension
If there’s a target protein, confirm binding likelihood again via structure prediction (e.g., AlphaFold)

This step is important because it repeatedly performs “exploration → convergence,” continually reducing the number of candidates.

In the ideal scenario, it goes from 400 → 20 → 10 → 5 → 1.

8) Feedback loop: When confirmed by experiments, the data enters training again

The most realistic part of drug AI is exactly this.

If it ended with predictions alone, it wouldn’t be real science—it would remain at the level of hypotheses.

Prediction stage: before experiments, the data is uncertain about whether it will be correct or not
Experiment confirmation: what turns out correct becomes “fact”
The results are then accumulated as data and reflected in engine training/improvement

However, this feedback isn’t always beneficial.

Due to biological characteristics, learning can have unintended negative effects
So version control (fine control) is necessary

In the original source, the concept supporting this was mentioned as a prot-based solution (in the form of learning/flow/version management).

9) Scale of biobig data: What “40 billion records” tells you

The original source provides a fairly large data scale.

Collected biobig data: mentioned as exceeding 40 billion records
There may be duplicates (more than half is expected to be duplicated)
Also separately mentioned is a dataset of roughly 17.2 billion records

The reason these numbers matter isn’t just that they’re “big,” but that they become fuel for probability-based exploration that AI needs.

The data classification framework mentioned in the original source is also key.

Omics/system levels: layered across lifeforms/genomes/groups (such as ethnicity)/proteins/metabolomics/spatial information, and more
So it isn’t an “reductionist single dataset”—it constructs the blueprint with multi-scale data

10) Why is data exploding, yet protein structure data is relatively scarce?

The original source compares production speeds of different data.

Genome base sequences: sequencing (decoding) technologies are well set up, so the growth rate is extremely fast
Mentions 80 petabase pairs or more based on NCBI raw data
As you go down to text, the size becomes even larger (converted into byte units)
Meanwhile protein structures: unlike sequencers (sequence analysis equipment) that enable mass production, it’s expressed as “trickle-trickle” because large-scale production is less feasible

The implication is:

AI drug development means that even as data increases, what determines success is “which kind of data” increased.

Therefore, the prediction engine’s performance depends on the available types of data, and the 부족한 areas must be developed with extra care.

11) Conclusion: The next step for drug AI is “engine + data + collaboration”

The final message was very realistic.

Not everyone can do it
Collaboration among researchers in each field is essential
They are expanding by holding networks on topics like the industrialization of plant resources every year

In other words, drug AI isn’t a single technology; it’s a multidisciplinary pipeline.

It’s a system that links data/prediction engines/experimental validation/operations (data flow management).

The most important content I summarized—“something other news/YouTube doesn’t usually cover well”

The essence of drug AI isn’t “predicting the answer,” but “finding probabilistically meaningful pathways and repeating loops that turn them into reality through experiments.”

Predictions like AlphaFold are a starting point, but errors can accumulate in the binding/delivery stages
So instead of blindly chaining engines together, quickly validate only explainable lines (mechanistic connectivity)
Plant genomics prediction boosts exploration efficiency because it “generates the candidate compound list first”
When prediction results are confirmed by experiments, they feed back into training data and the feedback loop that improves the engine is key

If you remember this one sentence, then no matter what drug AI news comes next, you’ll start to see “where and how it increases success probability.”

Main content to convey (checklist)

Structure prediction (protein 3D structure) is the starting point for antibody/binding design
Antibody design (AI) connects to predicting antigen–antibody binding pockets/interactions
Drug delivery is the step that designs the “probability of reaching the target,” and if it fails, it’s meaningless
Error accumulation means you need a strategy to quickly narrow down candidate pathways instead of simply stacking engines in series
Plant-genomics-based engine generates unknown compound candidates and increases exploration efficiency
Biobig data (large-scale) combined with experiment confirmation is what actually improves performance
Ultimately, collaboration: data/models/experiments/operations must run together

< Summary >

Drug AI starts with protein structure prediction (AlphaFold family), increases the likelihood of antibody design and target binding, and engineers the probability that the drug reaches where it needs to via drug delivery.

It also explores “unknown substances” by inputting plant genomics data to generate candidate useful compounds, and uses biobig data and functional prediction engines to reduce candidate numbers from 400 to the level of a few.

Predictions aren’t 100%, but by quickly confirming through experiments centered on explainable pathways, the feedback loop feeds data back into training and improves performance.

The key is that it’s a repeated loop where probabilities are converged and then made real through experiments—not answer prediction—and that multidisciplinary collaboration is essential.

[Related posts…]

*Source: [ 티타임즈TV ]

– AI는 식물에서 어떻게 신약물질 발굴할까? (박종선 인포보스 대표)

NextGenInsight.Net