● No-Code Revolution, Replit-Make Automation Boom
“Without a Single Line of Code” Automating and Creating Websites Has Become Reality: Practical Guide to Turning Work into ‘Money-Making Time’ with Replit and Make
This article includes the following:
1) How far non-developers can realistically automate work, summarized in a ‘real-world’ context
2) How to reduce online store customer service, order, payment, and notification processes to 5-10 seconds using Make (practical scenario)
3) Creating and deploying quotation systems and payment management systems with “just talking” using Replit (URL creation)
4) Points where beginners fail in cost structure (especially Replit pricing) and safe operation methods
5) The most crucial point often overlooked in other videos/news: Automation is not just about technology, but a ‘bottleneck definition’ game
1) News Briefing: Core Message from Yoon Yongsung, CEO of ‘Yoon Automate’
[Core Point]
The current boom in automation is partly because of “improved technology,” but more fundamentally because the cost of automation versus labor has reversed, making ROI attainable for anyone.
[Person/Background]
Yoon Yongsung, originally not from a development background, began automating repetitive Excel tasks with VBA after joining a company in 2009 and realizing the monotony.
This automation accumulated to the level where computers substituted an 8-hour workday.
As a result, productivity and accuracy increased, leading to recognition as a “proficient worker,” aiding in career advancement and ultimately leading to a startup (company name: Yoon Automate).
[Changes After AI]
Before ChatGPT, there were clear limits for solo entrepreneurs, but after generative AI, the speed of development and production increased 3 to 4 times, translating to revenue and recruitment.
It has become such that one would think, “Without AI, the business might have to close,” indicating a comprehensive redesign of work around AI.
2) Realistic Scope of Automation for Non-Developers
Highly Feasible Areas (Strongly Recommended)
Daily repetitive routine tasks like downloading/uploading/email sending/system logins/click repetition.
The reason is simple.
Less need for agreement (policy/security/approval), and as the person most familiar with the task, the non-developer is the most accurate in designing automation.
Relatively Challenging Areas (Company Level)
Cross-department approval/security/policy tangled company-wide process automation.
One might build an MVP and hand it over to developers, but often “development alone” doesn’t solve it (authority/audit/security/liability).
3) Vibe Coding Tips for Non-Developers: It’s More About “Providing Context” than “Asking Good Questions”
Practical Tip 1 from the CEO: Explain Fluently for More than One Minute
When typing, crucial context is often omitted, but voice input (dictation) naturally includes inconveniences/situations/exceptions.
Explaining with eyes closed often results in more detailed communication, improving the quality of context.
Practical Tip 2 from the CEO: Express Frustrations Like “I’m Stuck on Python Installation”
Being honest about not knowing helps AI automatically adjust the difficulty level.
“I’ve never used Python” → Get step-by-step guidance for installation/environment setup.
4) Tool Trend Changes: Shifting Focus from Local Installation to Web-Based Platforms
1) Local Installation (Developer-Friendly)
Tools like Cursor are powerful, but installation/setup/error messages become barriers for non-developers.
2) Web-Based (Non-Developer Friendly)
Tools like Replit, Lovable, v0, Bolt allow users to get started just by logging in.
The advantage of being web-based is the access from PC, mobile, and tablet, allowing immediate modifications and deployment in the ‘workplace.’
5) Make Automation Case: Reducing the Shopping Cart Order→CS→Notification to 5-10 Seconds
[Situation]
An instructor/program was sold on marketplaces (such as Gmarket), but refunds occurred if “immediate guidance post-payment” wasn’t provided.
The realization was as follows;
Automation is done not ‘because it’s good,’ but when it has a direct impact on revenue/trust.
[Structure (Summary)]
1) Payment Occurrence → Real-time Signal Delivery with Make (Webhook, etc.)
2) Order Information Query → Internal Notification via Slack
3) Check Stock/Quantity (especially offline lecture capacity)
4) Branch Processing for Bank Transfer (Order Only) vs Payment Completion
5) Changing Shipping Completion Status per Product (API Utilization)
6) Automatic Dispatch of Notifications via Kakao Talk/SMS/Email
[Effect]
Guidance is sent to the customer 5-10 seconds post-payment, reducing refunds to nearly zero.
Can operate without hiring CS personnel.
Make costs are about $9 per month, but even accumulated scenarios of 100 units present overwhelming ROI.
6) Replit Case: Creating Internal Work Systems “Without Writing Any Code”
[Why Replit?]
Where no-code/web builder tools previously focused on the ‘surface (front-end),’ Replit allows DB/authentication/deployment all in one place, making it a “real work system.”
[Actually Created: Quotation System]
1) Modification/management of quotation items on the web
2) PDF Download Button
3) Email/SMS Dispatch Button
4) Dispatch Log/Remarks Record
[Important Point Here]
Sending a quotation might seem like a “one-minute task,” but brain switching, proofreading, and typo checks easily consume 10 minutes.
Accumulated ‘trivial but important repetitions’ detract from team-wide productivity.
Ultimately, automation’s core is not just saving time but also preserving focus.
[Additionally Created: Automated Payment Collection]
Business account deposit occurrence → Detected on mobile → Automatically sent to internal site → Accumulation in payment details table.
7) Cost/Pricing Reality: Replit Can Be Risky When Approached as a “Hobby”
[Trap in the Pricing Structure]
Replit charges by chat (task) unit.
If you request a lot at once like “do this, do that, do everything,” the code change volume can increase significantly, resulting in a large charge.
[CEO’s Operating Tip: Brick Laying Strategy]
Make small requests: “Create a button” → “Add save function” → “Add delete function” in steps.
This way, the cost is controlled, and corrections are easier when the direction is off-course.
[Numerical Comparison from a Business Perspective]
The CEO spends around 2 million won monthly on Replit, but it’s evaluated as much cheaper compared to hiring a senior developer (felt at over 10 million won per month).
This is a core aspect of how companies’ digital transformation strategies are changing these days.
8) (Important) The “Real Core”: Automation is 80% About Defining the Problem, Not the Technology
Yoon Yongsung’s consulting method was impressive.
By asking “What’s your biggest problem?” he finds that most companies don’t know the exact problem.
Then he continually asks ‘Why?’ to trace the root of the bottleneck.
Why this is important
Automation misapplied may increase “work for work’s sake.”
If the approval structure is the problem, adding an automation tool may create an automated approval nightmare.
This perspective is linked with why AI/Data company Palantir is strong
It’s not about the tool but about obsessively identifying “the real problem in your organization” to achieve results.
9) Global Economic/Industrial Implications: Three Changes Brought by the Popularization of Automation
1) Productivity Inflation (Intensifying Silent Competition)
As processing capacity increases with the same workforce, industry standard productivity rises.
This directly affects the competitiveness of companies, linking long-term to economic growth rates.
2) Scale of ‘Small Teams’ Expands
Areas previously requiring developers/designers/PMs can now see MVPs created by small teams with AI and tools.
This trend changes not only startups but also the new business pace of established companies.
3) Cost Structure Change: Shift from CAPEX (Large Setup Costs) to OPEX (Subscription/Usage)
The past model of “billion-dollar build + maintenance” shifts to “monthly subscription + quick changes.”
From a company’s standpoint, as the interest rate environment is high (capital cost is expensive), such OPEX-style automation becomes more attractive.
10) ‘Employee Automation Roadmap’ for Immediate Application from Today’s Content
Step 1. Identify 3 Repetitive Tasks
Tasks repeated daily/weekly + click/copy-paste + with error risk are the top priority.
Step 2. 1-Minute Explanation via Voice → Have AI Document the Procedure
Describe the sequence exactly as you perform it, and ask AI to “organize it into a step-by-step checklist” for instant structuring.
Step 3. Automate ‘Notification/Summary/Delivery’ with Make First
Instead of starting with large automation, accumulate “small wins” like a 3-line email summary → Slack transmission, and reap exponential results.
Step 4. Use Replit When ‘Work Systems’ Are Needed
For moments when a “web app” is necessary, such as in quotation/payment management/customer response/internal application forms, Replit’s strengths grow significantly.
Step 5. Cost Control Principle
Particularly for Replit, make small requests, confirm the result → proceed with the next request.
This method simultaneously manages cost/quality/direction.
11) The Core SEO Economic Keywords Subtly Embedded in This Article
The current trend aligns with major economic keywords such as corporate productivity, digital transformation, interest rate environment, economic growth rate, and global supply chain restructuring, with a high likelihood of continued growth.
12) Conclusion: “Automation Is Not Development, but a Habit of Eliminating the Bottlenecks in Your Work”
The case of Yoon Yongsung is compelling because it’s not about grand AI but rather.
It’s about eliminating ‘small bottlenecks’ like email summarization, quotation dispatch, post-payment notifications to reclaim time.
And this reclaimed time ultimately links to revenue/performance/promotion/startup success.
< Summary >
Non-developers can sufficiently automate repetitive tasks.
Make excels in automating workflows like order/payment/notification/customer service, enabling processing in 5–10 seconds.
Replit allows verbal creation and deployment of web apps (such as quotations/payment management), accelerating work system development.
Replit’s chat-based pricing means creating in “small pieces” can control costs and direction.
The most crucial aspect is not the automation technology itself but the ability to thoroughly explore ‘why it is difficult.’
[Related Articles…]
- 2025 Work Automation Trends: Make, Zapier, and Agents Transforming Organizational Productivity
- Vibe Coding with Replit: Non-Developer Web App Production and Cost Management Checklist
*Source: [ 티타임즈TV ]
– “참 쉽죠?” 코딩 한줄 없이 레플릿으로 웹사이트 만들기 (윤용승 윤자동 대표)
● Billionaires Shun LASIK, Gen Z Premature Presbyopia Panic, Progressive Lens System Failure
“Why Do the Rich Prefer Glasses Over Eye Surgery?” The Real Secrets of Wealthy Families’ Eye Care + The Twist in the 2030 Early Presbyopia Debate
This article contains four key insights.
1) The most realistic reasons why global billionaires and wealthy individuals avoid LASIK/LASEK
2) Why the common notion “watching TV up close worsens eyesight” is only half true
3) The blind spots in news about the rise of early presbyopia among people in their 20s and 30s (reinterpreted based on a 2025 study)
4) Structural reasons why progressive lenses are less prevalent in Korea and potential solutions
1) News Briefing: The Secret Behind the “Non-Surgical Vision Improvement” Used by Over 90% in Developed Countries
Core Point: The conclusion of the video is actually simple.
“When addressing presbyopia/vision discomfort, options like surgery and eye drops are less safe and practical compared to ‘glasses (especially progressive lenses).’”
Progressive lenses, in particular, have a structure where
the focus points for long distance (top) / intermediate distance (middle) / short distance (bottom)
are gradually connected, minimizing the need to switch glasses like a magnifying glass.
2) Why Do the Wealthy and Rich Families Avoid Vision Correction Surgery? (A Summary of the Video’s Logic in One Sentence)
Conclusion: “Not because surgery is dangerous, but because there is no need to make irreversible interventions.”
The persuasive points from the video are as follows.
(1) LASIK/LASEK are basically ‘shaving surgeries’
These involve reshaping the cornea to change refraction, which creates a psychological barrier due to the difficulty in reverting problems if they occur.
(2) The optimal period (20s-30s) may already have passed
There is a generally preferred age range for LASIK/LASEK, and wealthy individuals or executives may have been more conservative in deciding during that period.
(3) The historical conservatism of “university hospitals”
During the early spread of vision correction surgery centered around private clinics, university hospitals had a more cautious view.
(4) Presbyopia ultimately leads many back to glasses
Even after undergoing vision correction surgery, aging can bring near vision discomfort (presbyopia), making the use of glasses inevitable.
3) What Differentiates the Eye Care Habits of Wealthy Families?
The key point here is not “special care through spending,” but differing rhythm and priorities in management.
(1) Regular check-ups as a default
Children may not realize they’re having vision issues, which can lead to missing the progression of myopia.
(2) Setting up an environment to encourage ‘seeing afar’
Encouraging the use of larger screens like tablets/TVs instead of smartphones to maintain a viewing distance.
Managing brightness (not too dark or too bright) simultaneously.
(3) Actively considering options like dream lenses if necessary
However, this isn’t a “mandatory action,” but rather a step to be taken based on medical judgment alongside regular check-ups.
4) “Does Watching TV Up Close Harm Vision?” → The Fact Is Not About “TV” but “Duration of Close-Up Tasks”
Key Distinction in the Video:
Watching TV up close doesn’t “directly” harm eyesight; this is more of a myth.
Instead, the real risk factor is:
Prolonged task duration at a close distance of around 30cm
Consequently, it’s not about the TV but
long-term usage of smartphones/reading/close-up screen tasks that increase myopia risk.
Interestingly, the causation is explained as follows.
“It may not be that watching TV up close worsened their eyesight, but rather that their poor eyesight led them to sit closer to the TV.”
5) Why Underestimating Myopia is Dangerous: “Good Vision” is Different From “Healthy Eyes”
This part is analogous to ‘risk management’ in economic news.
Even if the surface result (vision) looks good, the foundational health (eye structure) still needs separate management.
Core Point: Myopia is not simply an “increase in lens power” issue but a structural change as the eyeball elongates.
Even if you correct the front (cornea) with LASIK/LASEK,
the myopia-prone condition (genetics + environment) remains,
increasing the risk of retina/optic nerve issues as you age.
In summary:
LASIK/LASEK = A solution for comfortable viewing
Myopia management = Reducing long-term eye risks
6) The Twist in the Rise of Early Presbyopia Among the 2030s: “It May Not Be Presbyopia But Symptoms That Feel Like It”
This was one of the sharpest points in the video.
2025 Portugal Study Highlight:
Even if people in their 20s and 30s report “presbyopia symptoms,”
it might not be actual presbyopia, but rather discomfort from overuse of close-up vision (like smartphones) that feels like presbyopia.
The mention that domestic statistics on ‘presbyopia’ are not clearly captured within the disease management system further underscores this.
Thus, while media headlines may scream “2030s Presbyopia Surge,”
the reality could involve factors like “fatigue/function impairment/confusing with hyperopia/overuse of close-up distance.”
7) Comparison of Three Presbyopia Solutions: Surgery vs. Eye Drops vs. Glasses (Centered on Video Conclusions)
(1) Surgery
Presbyopia is an issue with the ‘lens (zoom function),’ whereas LASIK/LASEK intervenes in the ‘cornea,’ making the mechanism different.
Some methods (like monovision) present adaptation challenges (dizziness/confusion), emphasizing the difficulty of reversing them.
Artificial lenses are more reasonable in cases like cataracts, where replacement is “already needed,” but applying them to healthy lenses is controversial.
(2) Eye Drops
Reducing pupil size for a pinhole effect that allows “temporarily better vision” (mention of pilocarpine-based solutions).
However, concerns are raised regarding overblown advertising from a lack of long-term data/persistence, considering it more of a ‘flash effect.’
(3) Glasses (Especially Progressive Lenses)
Summed up as the safest and simplest method.
The problem is that an effective outcome requires a “proper fitting and adaptation system.”
8) Why are Progressive Lenses Less Common in Korea? It’s a ‘System Issue,’ Not a ‘Product Issue’
This part is entirely about structure.
(1) In the US, the examination/prescription system is compartmentalized
There is a dedicated system for vision exams (over 40 minutes) and prescriptions, leading to a natural adoption of progressives.
(2) In Korea, “It’s Hard to Allocate Time for Exams Due to Fee Structures.”
With eye clinics focused on disease-centered care, it’s challenging to ensure precise refractive exams or consultations.
(3) Progressive lenses require ‘education+fitting,’ but this process tends to be skipped
Progressive lenses require adaptation guides for peripheral distortion, changes in gaze movement (not just eye movement, but head turning), visual habits when descending stairs, etc.
Without this, merely “putting them on” leads to the perception of “progressives are inconvenient” due to dizziness and discomfort.
9) Clarifying “Compressed Lenses”: It’s About ‘Refractive Index (Material) Selection,’ Not ‘Compression’
The commonly used term “2x or 3x compression” on site is a term to help consumer understanding,
but it more accurately involves using a higher refractive index material to make lenses thinner at the same prescription.
Importantly:
Thinner isn’t automatically better
Overly thin lenses can result in visual distortion/discomfort; balancing according to personal prescription and lifestyle is key.
10) “Negative Vision” is a Scientifically Strange Expression (but There is a Reason People Use It)
The vision chart (0.1, 0.5, 1.0, 2.0) is a quantitative indicator based on angular resolution,
but the conventional use of “negative” for below 0.1 results from misuse.
11) The ‘Decisive Factor’ in Determining Vision: “Lack of Outdoor Activities” is Scarier Than Genetics
Genetic factors are certainly significant,
and data suggests a high likelihood of myopia in children based on parental myopia status.
However, the “lever more important than genetics,” as emphasized in the video, is environment, especially outdoor activities.
Exposure to daylight stimulates dopamine secretion,
which slows the elongation process of the eyeball, reducing myopia progression risk.
To sum up:
Genetics = Baseline risk
Outdoor activities/close-up habits = Execution variables changing the actual outcome
12) The Most Important Points Often Missed by Other News/YouTube (Reassessed from My Perspective)
Point A. The Reason the Wealthy Wear Glasses is a ‘Reversible Strategy,’ Not ‘Mistrust in Technology’.
Interpreting the wealthy’s choice as simply “LASIK is dangerous” is only part of the story.
The real strategy is avoiding ‘irreversible decisions’ and retaining options for later.
This resonates with asset management principles.
During periods of high volatility, some people prioritize ‘recoverability’ over profits.
Point B. Korea’s Progressive Lens Issue is a ‘Testing, Fitting, and Adaptation Infrastructure Problem,’ Not a Lens Performance Issue.
The misconception that progressives are “inconvenient products” may stem from market structure.
Lack of examination time results in consumer failures and market shrinkage.
This also connects with productivity issues in broader medical/healthcare services.
Point C. ‘2030 Presbyopia’ is a Subject Prone to Fear Marketing.
Presbyopia, as a term, elicits emotional pushback (like “Are you calling me old?”), resulting in clickbait.
However, the real issue might be function fatigue/lifestyle habits/accommodation issues, making premature self-diagnosis risky.
Point D. Myopia is ‘Long-term Risk Management,’ Not Just “Inconvenience from Glasses”.
Viewing myopia purely from a correction standpoint can result in discontinuing regular check-ups.
Stopping check-ups increases the risk of long-term issues (retina/optic nerve) being overlooked.
13) (Economic/Tech Perspective) Why This Issue is Growing in the Market: Healthcare is Moving Towards ‘Personalization’
The significance of this shift lies in the way eyeglass/lens industries are transitioning from “off-the-shelf sales” to
personal tailored eyewear + data-driven examinations + lifestyle consulting.
From a global economic standpoint,
given aging populations + declining working-age population + increasing healthcare costs,
it’s a typical pattern for the demand to shift from “surgery/treatment” to “prevention/management/assistive devices.”
With the AI trend as well,
areas like automated vision examinations, facial scan-based fittings, and lens recommendations based on individual usage patterns are likely to expand rapidly.
Reference this article linking to the following economic SEO keywords:
interest rates, inflation, economic recession, US GDP, exchange rates
<Summary>
The core reason the wealthy choose glasses over surgery is not because “surgery is dangerous,” but more about “avoiding irreversible decisions as a form of risk management.”
The key factor in vision deterioration is ‘duration of close-up tasks,’ not the distance from the TV.
The rise in early presbyopia among people in their 20s and 30s may actually be ‘symptoms resembling presbyopia’ due to overuse of close-up vision, rather than true presbyopia (as noted in a 2025 study).
The reason progressive lenses are less prevalent in Korea is due to a lack of testing, fitting, and adaptation education systems, rather than product performance.
Even if LASIK improves vision, managing myopia is not the end goal; regular check-ups are essential due to long-term risks associated with structural changes in the eye.
[Related Articles…]
- How AI Transition is Changing the Structure of the Healthcare Industry (Focusing on Examination/Customization/Prevention)
- Impact of Exchange Rate Fluctuations on Global Consumer Goods and Medical Device Prices (Outlook for 2026)
*Source: [ 지식인사이드 ]
– “선진국은 이미 90% 이상이 사용해요” 수술 없이 눈이 확 밝아지는 방법
● Meta Unleashes Post-LLM Shockwave, Semantic AI Crushes Token Models, Real-Time Costs Collapse
Meta FAIR Unveils ‘Beyond LLM’: VL-JEPA Predicts ‘Semantics’ Instead of Words, Changing the Game
Today’s article contains these core points.
① Why “token (word) generation”-centric multimodal AI is structurally inefficient
② How Meta FAIR’s (Yann LeCun’s team) VL-JEPA bypasses this through “semantic embedding”
③ Why learning efficiency has significantly increased under the same conditions (fewer parameters but better performance)
④ How ‘selective decoding’ reduces costs in real-time video/wearables/robots
⑤ Why one model can solve classification, search, VQA, and world modeling without generating text
1) News in one line: Shift from “Bigger LLMs” to “AI that Directly Predicts Semantics”
VL-JEPA (Vision-Language Joint Embedding Predictive Architecture) by Meta FAIR is a new multimodal architecture designed to predict the ‘semantic vector (embedding)’ of a correct answer by observing images/videos, without generating the “correct sentence” token by token.
This approach moves the AI’s focus away from the competition centered on generative AI, towards environments where real-time understanding, low latency, and low computational cost are crucial, such as smart glasses, robotics, monitoring, and navigation.
2) Why are existing Vision-Language Models (VLM) inefficient even when they “get the answer right”?
2-1. Problem ①: Token models see “different expressions for the same answer” as different
For example, in response to “What happens when you lower the switch?”,
“The light goes off / The room becomes dark / The lamp turns off” are almost identical in meaning.
However, token-based generation changes the learning target if the sentences differ superficially.
Ultimately, the model spends too much learning budget on ‘expression diversity’ instead of ‘meaning’, structurally undermining learning efficiency in the long run.
2-2. Problem ②: Latency and cost are critical in real-time applications
Video or sensor streams keep flowing, but token generation completes meaning one character (token) at a time.
This means you can’t know “what the model understood” until decoding is finished.
This structure increases latency and computational costs (including cloud costs) in environments where real-time decision-making is crucial.
3) Core Idea of VL-JEPA: Predicting “Semantic Embedding” Instead of “Words”
3-1. Understanding the structure through 4 blocks
(1) Visual Encoder
Compresses received image/video frames into visual embeddings.
Uses V-JEPA 2 (about 304 million parameter vision transformer, self-supervised), kept frozen during training.
(2) Predictor (core)
Predicts the “correct semantic embedding” by taking visual embeddings and text questions (prompts).
Although initialized from LLM (e.g., Llama 3 series), it is designed for all-token attention rather than sequential token generation using causal masking, maximizing “vision-text interaction”.
(3) Y Encoder
Encodes answer text (correct sentence) into ‘correct semantic embedding’ during training.
The crucial point is a target designed to capture “meaning,” not “expression”.
(4) Y Decoder (secondary)
Minimal involvement in training.
Primarily used to convert embeddings into sentences during inference when human-readable text is needed.
3-2. Different learning objective: Loss in embedding space, not token loss
VL-JEPA is trained to “align closely to the correct embedding” and contrastively separates from “answers of different meanings”.
The semantic space becomes more structurally organized.
Similar answers naturally cluster together, while different answers form stable, distinct expressions.
4) Why the results demonstrate a ‘structural advantage’: Controlled comparisons show learning efficiency divergence
4-1. “Almost everything identical” was set, changing only the prediction target for comparison
The researchers ensured a fair comparison.
Identical settings were used for vision encoder, resolution, frame rate, data mix, batch size, and training steps,
changing only “what to predict (tokens vs. embeddings)”.
Token-based baseline: predicts tokens using an approximately 1 billion parameter language model
VL-JEPA: predicts embeddings using an approximately 500 million parameter predictor (fewer training parameters)
4-2. Similar at first, but VL-JEPA grows faster and longer as data accumulates
Similarities were seen initially (about 500,000 samples), but the gap widened as learning progressed.
At 5 million samples, the video captioning metrics showed VL-JEPA around 14.7 vs. the token model around 7.1,
with VL-JEPA maintaining higher top-5 classification accuracy (e.g., 35% vs. 27%).
The key takeaway is that this is due to the structural difference between “meaning” as the learning target instead of words, not just a “tuning trick”.
The implication is that this efficiency gap could enlarge as data accumulates in the long term.
5) The truly intimidating point in real-time video: ‘Selective Decoding’
5-1. Decoding (sentence generation) is expensive → Only do it when necessary
VL-JEPA continuously produces an embedding stream, allowing text conversion only during “meaning-changing intervals”.
If there’s little semantic change, sentences need not be generated every second.
5-2. Experiment results: Maintain similar performance while reducing decoding frequency by about 2.85 times
For procedural videos (Ego-Exo4D, averaging 6 minutes, with approximately 143 action annotations),
they compared periodic decoding (e.g., every second) with segment-based decoding based on embedding changes.
VL-JEPA achieved a similar level of performance by decoding about once every 2.85 seconds, compared to decoding every second for similar performance.
This indicates a possibility to significantly reduce text generation costs while maintaining “understanding/tracking”.
This is directly applicable in optimizing cloud costs, on-device AI, and edge computing,
potentially altering cost structures for services where “decoding costs are bottlenecks”.
6) One Model for Classification, Search, and VQA: ‘Semantic Space’ Becomes the Common Interface
6-1. Open Vocabulary Classification
Convert label candidates into embeddings and choose the label closest to the prediction embedding.
Eliminates the need for separate classification heads, simplifying product deployment.
6-2. Text-to-Video Retrieval
Convert text queries into embeddings and rank them based on similarity to video embeddings.
Search quality becomes a contest of “semantic space alignment”, where VL-JEPA targets a head-on win.
6-3. Discriminative VQA
Convert candidate answers into embeddings and select the closest one.
This offers design potential to reduce hallucination rather than generating plausible sentences.
7) Benchmark Highlight: Catching Up to “Specialized Model-Level” with ‘One General Model’
Evaluated broadly across 8 video classification and 8 text-to-video retrieval datasets,
VL-JEPA Base (about 1.6 billion parameters, using around 2 billion training samples) achieved results sometimes surpassing baselines that viewed ultra-large training volumes (tens of billions of samples) on average.
It further rose after supervised fine-tuning (SFT), showing intervals approaching specialized systems separately tuned for each dataset.
8) Message from “World Modeling” Experiments: Match Changes Instead of Explaining in Words
In a task selecting an action that created the change between an initial image and final image from four candidate videos,
VL-JEPA SFT achieved a 65.7% accuracy, setting the SOTA record.
The crucial interpretation here is this:
Rather than an intermediation via “captioning the world into sentences → reasoning about those sentences”,
initially modeling changes through latent semantics could be stronger.
This signals a big directional shift long-term in areas like robot manipulation, physical causality reasoning, and task planning.
9) Five Truly Important Points, Often Overlooked in the News, From This Approach
9-1. “LLM hasn’t been discarded but its role has been repositioned”
Commonly perceived as a “replacement for LLM”, in reality, language remains an output option, with core computation ending in the semantic space.
Language remains the ‘interface’, while the ‘engine’ transitions to semantic prediction.
9-2. Directly tackling the cost structure bottleneck of ‘generation (decoding)’
In many contemporary AI services, the token generation section incurs the most financial leakage.
VL-JEPA changes the design to “generate only when necessary”, offering a built-in path to operational cost reduction.
9-3. If “semantic space alignment” is achieved, multitasking becomes much simpler in practice
Instead of attaching separate heads for classification/search/VQA, they’re unified under the single common standard of embedding similarity.
This offers advantages in the speed of feature addition, maintenance, and deployment (including edge).
9-4. The learning efficiency gap “grows more critical with more data”
In today’s macroeconomic environment, AI investments prioritize “profitability/efficiency” alongside “growth potential”.
In a reality where data and computation aren’t limitless, designing with structurally higher sample efficiency boosts eventual success rates.
9-5. Areas where LLM excels (tool use/agent/deep reasoning) remain → Hybrid likely the answer
The paper also acknowledges that deep logical reasoning, tool use, and agent planning favor token-based models.
The realistic market endpoint will likely be a combination of “LLM (planning/language) + JEPA (perception/real-time semantics)”.
This fusion can reshape corporate AI adoption, productivity enhancement, and AI semiconductor demand structures (decoding vs. encoding/embedding computation proportions).
10) Checkpoints from Economic and Industrial Perspectives (For Immediate Use in Investment/Strategy)
① Expanding Demand for Real-Time AI
Increasing demand for “continuously seeing and understanding” in smart glasses, industrial safety monitoring, retail store analytics, and autonomous driving assistance, where token generation was overcostly.
② Optimization of AI Infrastructure Costs Equals Competitive Edge
In high-interest environments and periods of increased volatility, AI infrastructure cost (inference cost) is a more direct competitive edge than sheer performance.
③ Quality of Semiconductor/Cloud Demand Can Change
As focus shifts from token decoding to embedding prediction, vision encoding, and streaming processing optimization, the workload mix changes.
④ Enterprise AI Shifts from “Chatterbots” to “Systems that Understand the Field”
In sectors like manufacturing, logistics, security, and healthcare, where video/sensors/events are more fundamental data than text.
The next step for multimodal systems in this market isn’t ‘explanation’ but ‘situation understanding’.
⑤ Global Supply Chain Perspective: Incentive to Move Onto Device/Edge Increases
If decoding can be reduced to cut computation, the economic justification for edge placement (on-device processing) improves.
This is particularly strong in sectors with large needs for data sovereignty/security/latency requirements.
< Summary >
VL-JEPA predicts semantic embeddings instead of generating tokens, structurally reducing wasted learning on expression diversity and real-time decoding costs in multimodal AI architecture.
In controlled comparisons, it surpassed video performance and learning efficiency with fewer training parameters and showed potential to reduce real-time costs by 2.85 times through embedding stream-based ‘selective decoding’.
Connecting classification, search, VQA, and world modeling with a single semantic space suggests that a future “LLM (planning) + JEPA (perception)” hybrid may become an industry standard.
[Related Articles…]
- Why AI Infrastructure Cost is Crucial for Corporate Competitiveness (Focus on Inference Cost)
- Signals of Demand Shift from ‘Decoding’ to ‘Real-Time Multimodal’ in AI Semiconductor Market
*Source: [ AI Revolution ]
– They Just Built a New Form of AI, and It’s Better Than LLMs


