Sign in to view Siva Rama Krishna’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Siva Rama Krishna’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Bengaluru, Karnataka, India
Sign in to view Siva Rama Krishna’s full profile
Siva Rama Krishna can introduce you to 10+ people at Qualcomm
Join with email
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
1K followers
500+ connections
Sign in to view Siva Rama Krishna’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Siva Rama Krishna
Siva Rama Krishna can introduce you to 10+ people at Qualcomm
Join with email
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Siva Rama Krishna
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Siva Rama Krishna’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
1K followers
-
Siva Rama Krishna Reddy reposted thisSiva Rama Krishna Reddy reposted thisLets share the good news..... We got her.. thanks for all ur support friends... MISSING: Anvi Verma, 14 years 9th std student is missing since morning 6:00 am today (10th Jan 2024) from Jeevan Bhima Nagar Police station limits. She is 5.3 ft height, fair complexion and is fluent in speaking Kannada, Hindi and English languages. If any lead please contact her father on +91 94483 13178 Still not found request you to share in bigger grp for the reachability ..
-
Siva Rama Krishna Reddy shared thisAI/ML Hardware & Compiler Research Engineer - https://rb.gy/jcitgoSiva Rama Krishna Reddy shared this#Qualcomm is hiring for #Graphics Systems team in Bangalore/Hyderabad. Here's your chance to work with the “Top Guns" helping define the dominating GPU in today’s mobile/smart phone market. Our power efficient GPU solution is fundamental to enable the exciting new markets like VR, IoT, AI, drone, autonomous driving etc. Shariq H. |Sharad Raj | Ramachandra C Nanjegowda (Ram) | Shashi Bhushan Singh | Sukumar Srikanth |Subhomoy Bhattacharyya | Below are the details about open positions; Kindly Apply | Refer | Share or send mail to gayithri@qti.qualcomm.com or use the below link to apply. Job Description Graphics Performance Engineer - https://shorturl.at/DfoC1 Graphics Compiler performance Engineer - https://shorturl.at/q1PHk Graphics Compiler Test Engineer - https://shorturl.at/S1Gjf Graphics system / Application Engineer - https://shorturl.at/hFiaR Graphics Power Performance Architect - https://shorturl.at/gY2Cb
-
Siva Rama Krishna Reddy reposted thisSiva Rama Krishna Reddy reposted this#Qualcomm is hiring for #Graphics Systems team in Bangalore/Hyderabad. Here's your chance to work with the “Top Guns" helping define the dominating GPU in today’s mobile/smart phone market. Our power efficient GPU solution is fundamental to enable the exciting new markets like VR, IoT, AI, drone, autonomous driving etc. Shariq H. |Sharad Raj | Ramachandra C Nanjegowda (Ram) | Shashi Bhushan Singh | Sukumar Srikanth |Subhomoy Bhattacharyya | Below are the details about open positions; Kindly Apply | Refer | Share or send mail to gayithri@qti.qualcomm.com or use the below link to apply. Job Description Graphics Performance Engineer - https://shorturl.at/DfoC1 Graphics Compiler performance Engineer - https://shorturl.at/q1PHk Graphics Compiler Test Engineer - https://shorturl.at/S1Gjf Graphics system / Application Engineer - https://shorturl.at/hFiaR Graphics Power Performance Architect - https://shorturl.at/gY2Cb
-
Siva Rama Krishna Reddy shared thisSiva Rama Krishna Reddy shared thisYou don't want to miss TVMCon 2023! From March 15 - March 17th, Experience this virtual state of the art deep learning compilation and optimization, with a range of tutorials, research talks, case studies, and industry presentations 🖥 We can't wait to see you there! Click below to sign up today: https://lnkd.in/gBSAJnWz #apachetvm #tvmcon2023
-
Siva Rama Krishna Reddy shared thisI'm honored to be speaking at @ApacheTVM #TVMCon2023 organized by @OctoML. Sign up today for TVMCon 2023 if you want to hear more about.
-
Siva Rama Krishna Reddy shared thisSiva Rama Krishna Reddy shared thisSiva Rama Krishna Reddy demonstrates Qualcomm's recent progress for on-device training w/ Apache TVM! Check out this video covering how LeNet-5 & MobilenetV1 are trained on an Adreno mobile GPU using TVM. Congrats! https://lnkd.in/g-AMsev
-
Siva Rama Krishna Reddy shared thisSiva Rama Krishna Reddy shared thisGPU Team, India is hiring. Please see the details below and send resumes directly to the specified email. Ajit Rao Krishnaiah Gummidipudi Shariq H. Sreyas Kurumanghat Siva Rama Krishna Reddy Himanshu Govil Kalyan Kumar Bhiravabhatla #snapdragon #Qualcomm #hiringengineers
-
Siva Rama Krishna Reddy shared thishttps://lnkd.in/fR5jFiA
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked thisThe MLSys’26 program is live! Check out the accepted papers: https://lnkd.in/eSFVu_pZ This year marks several exciting firsts: • 28 industry track papers bridging MLSys research & real-world deployment • Our inaugural competition track featuring AWS Trainium, Google Graph Scheduling, and NVIDIA FlashInfer AI Kernel contests Early registration deadline: April 1 — don’t miss it! See you in Seattle this May🌲
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked this📢 We are super excited to announce that Tianqi Chen from NVIDIA and Carnegie Mellon University will give a presentation on building ML Systems at the CODAI Workshop 2026! When? Next Wednesday, January 28th in Krakow, collocated at HiPEAC 2026! 🎙️ Talk Title: "Building ML Systems Foundations at the age of AI" ✔️ Register Today: https://lnkd.in/dMFZraDi 📑 Full Program: https://lnkd.in/dMFZraDi #CODAI2026 #EdgeAI #MachineLearning #AICompilers #Optimization #TinyML #AI
-
Siva Rama Krishna Reddy liked thisvLLM heavily uses torch.compile pattern matcher to land custom fusion optimizations like RMSNorm + quant, QK norm + RoPE, and more. The pattern matcher is one of the most important tools in the torch.compile it lets the compiler recognize a specific FX subgraph and rewrite it into a fused, faster replacement. In my recent post, I explain this from first principles: what the pattern matcher is doing, how register_replacement works, how matches are validated, and why auto_functionalized matters for your custom ops. If you’re curious about vLLM’s fusion optimizations, want to understand how torch.compile works under the hood, or you’re learning ML compilers, this post should be useful. blog: https://lnkd.in/gdSDbbQZ
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked thisRight before the holiday break, Jared Roesch and I presented an intro into CUDA Tile: a major addition to the CUDA platform to the GPUMode community. If you missed it, it's on Youtube right now! https://lnkd.in/eUiM7SzC We included detailed examples of the Python DSL, but also some preview of our work to expose in C++ and Rust (with some code snippets!) the tile-based programming model introduced with CUDA Tile. Thanks Mark Saroufim for inviting us.
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked thisTax Cricket. Build IITs. ₹15,000 Crore Trade-Off India Is Quietly Making India runs the world’s richest cricket league It also taxes scientific research at 18%. That contrast is no longer philosophical; it’s mathematical. An IISc professor recently pointed out something uncomfortable: A 40% tax on IPL profits over 3 yrs could raise ₹15000 cr. That money could build multiple new IITs or fully remove GST on research equipment. ✅ The Numbers Behind The Debate 1. IPL valuation(2024): ₹1.37 lakh cr 2. BCCI annual IPL profit: ₹8200–8900 cr 3. BCCI tax paid on IPL profits: ₹0 4. GST on lab equipment for IITs, IISc, AIIMS: 18% 5. Annual GST paid by research institutions: ₹1600–1800 cr. India taxes microscopes. It exempts sixes. ✅Where Does ₹15,000 Crore Come From? Over a 3-year IPL cycle: 1. BCCI IPL profits:₹24000–26000 cr 2. 40% tax on that alone:₹10000–10600 cr. Add: Franchise profits, Stadium revenues, Ancillary commercial income 3. ₹13000–15000 crore becomes entirely defensible. This is not a theory. It’s arithmetic ✅ What ₹15000 Crore Can Actually Build 1. IIT Capacity •Cost per world-class IIT: ₹2800–3000 cr •₹15000 cr funds: 5 fully built IITs, or 7–10 IITs with co-funding. Each IIT over 10–15 years: 8000 engineers,1200 PhDs, 200+ startups & GDP multiplier: 8-12X 2. Research Acceleration: India R&D spend: 0.64% of GDP, Global benchmark: 3–5%, ₹15000 crore adds a 17% jump to public R&D: Semiconductors, AI & Quantum, Biotech & pharma, Climate-resilient agriculture. This is where every developed economy invested before becoming rich 3. Remove GST on Research Equipment • Current GST drain: ₹1600–1800 cr/yr • ₹15000 crore covers 3 full years of GST relief • Labs instantly buy 18% more equipment • Brain drain slows because infrastructure finally matches ambition ✅ The Tax Paradox No One Wants To Touch Category Tax Rate IPL / BCCI 0% Scientific equipment 18% GST Film producers 58% effective Global sports leagues 22–25% India is the only major economy where the richest sports league pays zero tax, while publicly funded research pays indirect tax ✅ The Real Reason This Persists • IPL viewers:560 million • IIT students:92000. One wins elections ✅ Core Question 1. Would cricket collapse if IPL profits were taxed like every other industry? No 2. Would India’s innovation capacity change meaningfully if research stopped being taxed? Absolutely 3. This is not about punishing cricket. It’s about stopping the taxation of knowledge creation ✅ Let me share #Rajspectives 1. A nation that taxes labs but exempts leagues is choosing consumption over compounding 2. ₹15000 cr won’t bankrupt cricket, but it could materially strengthen India’s scientific spine 3. Every developed economy taxes entertainment & subsidises research. India is doing the reverse 4. The real subsidy isn’t to cricket, it’s to political comfort India doesn’t lack money. It lacks allocation courage #india #taxes #cricket #sports #research #economy
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked thisView my verified achievement from The Linux Foundation.Speaker: gRPConf India 2025 was issued by The Linux Foundation to Eswar Rajan Subramanian.Speaker: gRPConf India 2025 was issued by The Linux Foundation to Eswar Rajan Subramanian.
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked this🌟 Automation, AI & Yoga Therapy — Why Human Intervention Still Matters 🌟 Automation—whether in software testing, operations, or routine workflows—was created to solve one major challenge: ➡️ the inefficiency and errors that arise from repetitive manual work. Automated scripts reduce monotony, speed up execution, and increase consistency. In the same way, standardized yoga protocols help beginners and the general public practice safely on their own. These structured sequences guide people through a systematic routine without needing expert supervision every day. But here’s the reality we often overlook 👇 🔍 Automation is powerful, but not perfect. Even the most beautifully designed automated framework requires: ✔️ Manual review of logs ✔️ Debugging unexpected failures ✔️ Handling scenarios that cannot be automated ✔️ Human judgment to certify the final outcome Why? Because AI and automation only work within the boundaries of what they were designed for. If a use-case isn’t modeled correctly, or if the real-world scenario deviates even slightly, automation will miss it. Machines cannot fully interpret context, nuance, or dynamic interactions—not yet. 🧘♂️ Yoga therapy follows the same principle. Standard yoga modules are helpful for: ➡️ Beginners ➡️ Fitness enthusiasts ➡️ People with simple, isolated health concerns But what about those with multiple, intertwined conditions like: High BP + diabetes Thyroid issues + insomnia Anxiety + digestive disturbances Chronic pain + hypertension For them, a generic yoga protocol will not work. Just as a test script fails when real data becomes complex, standard yoga routines fail when the human body presents layered challenges. This is where a Yoga Therapist becomes essential. A therapist can: ✔️ Assess the client’s individual conditions ✔️ Modify the protocol step-by-step ✔️ Observe how the body and mind respond ✔️ Tweak breathing ratios, intensity, and sequence ✔️ Provide personalized corrections and therapeutic monitoring Exactly like an experienced engineer tuning an automation suite to handle edge cases and real-world complexity. 🌱 The truth is simple: Automation will fail when scenarios become complex. AI will fail when context is missing. Protocols will fail when individuality is ignored. And in all these moments, ✨ Human expertise becomes the deciding factor. ✨ Engineers refine automation. Yoga therapists refine the sadhana. Personalization and human insight complete what machines and protocols cannot. Pic: Receiving Certificate from Guruji OP Tiwari, Chairman, Kaivalyadham S.M.Y.M. Samiti, Lonavala.
-
Siva Rama Krishna Reddy liked thisSiva Rama Krishna Reddy liked thisExcited to share something we have been cooking in the past few months that comes out in #cuteDSL's latest release thanks to great collaborations with many, brings down host overhead optimization (10-40µs down to a 2µs in hot loops, streamlined PyTorch interop and much more portable deployment. Python kernel DSLs, have the potential to greatly improving productivity in writing GPU kernels. However, they also come with some challenges, including: - Robustness: many kernels makes assumptions on input constraints(shape being equal, address aligned), we need to give user good error messages when assumptions are wrong instead of segfault - Host Efficiency: GPUs are running so fast that host(CPU) usually can be a bottleneck, every microsecond counts! and doing so in python DSL brings extra challenge - Framework interop, directly pass in torch.Tensors, work with torch environment streams and handles device guards. - Portability: python is great for development, but there are also needs beyond python, we need to ship to automotive, robotics, and more, many needs C++/rust, or runtimes like XLA runtime in JAX. #cuteDSL's latest release comes with some major features we have been working on in past few months to address these challenges. Behind the scene, we leverage apache tvm-ffi open ABI to expose kernels under a stable zero-cost ABI convention. All attribute checks are compiled down to non-observable overheads. The kernel host overheads are optimized from 10-40µs down to a 2µs in hot loops. Importantly, no more needs to convert from torch.Tensor to special intermediate tensors, directly pass torch.Tensors and the kernel works out of box. We can also export the compiled module to object file that then get further bundled into applications without python dependencies (c++/rust), or any framework(e.g. JAX/XLA) platforms that supports the ABI. Checkout the documents to learn more https://lnkd.in/e87iJi-H
Experience & Education
-
Qualcomm
********* ****** *****
-
*** ****** ******** **********
*********
-
****** ************ *****
**** ********** **** ****** * **** ********
-
******** ********* ** **********
*** * ********** ************ * ******* ******** *********** ********** undefined
-
-
********** ***** ************* **********
* **** ******** ******* * *********** **********
-
View Siva Rama Krishna’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Publications
-
Harnessing Qualcomm Adreno GPU for Generative AI: Open-Source Approach
Qualcomm Developer Blog
-
How to run DeepSeek models on Windows on Snapdragon – Llama.cpp and MLC-LLM tutorial
Qualcomm Developer Blog
-
Whats Good About Tensorflow 2.0
Open Source For You
See publicationVersion 2.0 of TensorFlow is focused on simplicity and ease of use. It has been strengthened with updates like eager execution and intuitive higher level APIs accompanied by flexible model building. It is platform agnostic, and makes APIs more consistent, while removing those that are redundant.
-
ONNX : Open Neural Network Exchange
EFY GROUP / OpenSource ForYou
See publicationONNX or Open Neural Network Exchange (onnx.ai) is a community project created by Facebook and Microsoft. It is intended to provide interoperability within the AI tools community. ONNX unlocks the framework dependency for AI models by bringing in a new common representation for any model, which allows easy conversion of a model from one framework to another.
-
Tensor Virtual Machine - An Open Deep Learning Compiler Stack
EFY GROUP / OpenSource ForYou
See publicationTVM stack begun as a research project at the SAMPL group of Paul G. Allen School of
Computer Science & Engineering, University of Washington. This project is now driven by an open source community involving multiple industry and academic institutions.
Refer to project home page at tvm.ai and github repository at https://github.com/dmlc/tvm
Patents
-
Mechanism for the delivery of computing as a utility for different domains over the internet
Issued US WO 2010073259 A2
Organizations
-
Apache Software Foundation
Committer
- PresentApache Committer for project TVM (Tensor Virtual Machine) a deep learning compiler. - Contributor and maintainer of TensorFlow frontend in TVM. - Contributed and maintainer of golang runtime bindings for TVM. - Contributed various operators across TOPI, Relay and front ends.
-
Deep Machine Learning Community
Committer
- PresentActive contributor to Deep Machine Learning Community (https://github.com/srkreddy1238) and committer in deep learning compiler project TVM (https://github.com/dmlc/tvm). The Tensor Virtual Machine stack began as a research project at the SAMPL (System, Architecture, Machine learning and Programming Language) group of the Paul G. Allen School of Computer Science & Engineering, at the University of Washington in the US. This project is now driven by an open source community and involves…
Active contributor to Deep Machine Learning Community (https://github.com/srkreddy1238) and committer in deep learning compiler project TVM (https://github.com/dmlc/tvm). The Tensor Virtual Machine stack began as a research project at the SAMPL (System, Architecture, Machine learning and Programming Language) group of the Paul G. Allen School of Computer Science & Engineering, at the University of Washington in the US. This project is now driven by an open source community and involves multiple industry and academic institutions.
Recommendations received
1 person has recommended Siva Rama Krishna
Join now to viewView Siva Rama Krishna’s full profile
-
See who you know in common
-
Get introduced
-
Contact Siva Rama Krishna directly
Explore more posts
-
Sai Manikanta
topmate.io • 34K followers
Part - 2: 3. Silicon Savvy Why it stands out: Based in Hyderabad, they explicitly list services such as RTL Design, Design Verification, DFT, Physical Design, Synthesis & STA. The focus on verification and RTL design aligns well with your interest (AI-powered ASIC verification & digital design). What this means for you: This might be one of the more accessible companies if you want a role firmly in digital-design/verification rather than mixed signal. Investigate team size, digital-block types (interfaces, accelerator, etc.), and the design node maturity. --- 4. Spectrum Digital Info Private Limited Why it stands out: The company is located in Hyderabad and offers VLSI services from “specifications to GDS sign-off, across process nodes from 350nm to 7nm”. They list domain experience in high-speed interfaces (PCIe, DDR, SerDes), which are important digital design areas. What this means for you: This company could give you opportunities in digital design especially with interfaces/high-speed digital which fits well with your ASIC verification and digital design interest. Ask about the proportion of digital vs analog work, node level, the kind of digital blocks (I/O, interface, accelerator) and verification responsibilities. --- 5. Cyient Ltd. Why it stands out: Listed among top semiconductor companies in Hyderabad offering custom chip design, layout, and testing. As a larger engineering firm, it may offer broader exposure and structured teams. What this means for you: Good option if you want a more structured company with possibly larger team and defined roles. Might be less niche than pure-chip houses, but may provide good experience and career path in digital/ASIC design. --- ✅ How to Pick Which to Target & What to Ask Since your goal is to build expertise in digital design + ASIC verification (with AI-hardware angle) aligned with your roadmap, here are key criteria and questions: Digital-design centricity: Ensure the role is digital RTL/design/verification not just layout/mixed-signal or analog heavy. Ask: “What percentage of the team is digital vs analog?” Verification exposure: Since you’re interested in verification (AI-powered testbench etc.), ask: “Will I get to work on RTL + verification + coverage? What verification methodology is used (SystemVerilog, UVM)?” Advanced nodes or strategic blocks: Check whether they work on relevant nodes (e.g., < 28nm) or high-speed interfaces, accelerators, or SoCs / AI-hardware. Example: Spectrum and MosChip work on high-speed digital/ASIC. Learning & growth path: Ask how juniors are developed, what is the typical project progression, whether you can move from simple blocks → more complex SoC/accelerator, and whether there are opportunities for AI-hardware, EDA/verification, etc. Company size & culture fit: Smaller design-houses give broader exposure but maybe less structured growth; larger ones might be more stable but more compartmentalised. Decide what you prefer.
3
-
James Lee
Wiwynn • 1K followers
《讀書會 - 7》 9.3.2 Routing Study 佈局研究 Once the signals have been categorized and the initial timings have been determined, a list of potential interconnect topologies for each signal group must be determined. 訊號分類和初步時序確定後,必須為每個訊號組建立一份可能的互連拓撲清單。 This requires significant collaboration with the layout engineer and is developed through a layout study. 這需要與佈局工程師密切合作,並透過佈局研究來完成。 The layout and design engineers should work together to determine the optimum part placement, part pin-out, and all possible physical interconnect solutions. 佈局工程師與設計工程師應共同找出最佳的元件擺放方式、引腳配置,以及所有可能的實體互連方案。 The layout study will produce a layout solution space, which lists all the possible interconnect topology options, including line lengths, widths, and spacing. 佈局研究會產生一個「佈局方案空間」,內含所有可能的拓撲選擇,例如線長、線寬、線間距等。 Extensive simulations during the sensitivity analysis will be used to limit the layout solution space and produce a final solution that will meet all timing and signal quality specifications. 在靈敏度分析期間,會透過大量模擬縮小方案空間,選出同時滿足時序與訊號品質需求的最終設計。 During the sensitivity analysis, each of these topologies would be simulated and compared. The best solution would be implemented in the final design. 在此階段,每種拓撲都會被模擬與比較,最終會採用最佳方案。 📌 我的註解:高速訊號不一定最麻煩,反而是「你以為的慢速訊號」在等你 My note: High-speed signals are not always the hardest; the real trouble often comes from the “slow signals you underestimated.” 我常提醒工程師,真正危險的不是 PCIe、UPI、USB 3.0 這種高速差動訊號(這些反而容易處理,規格也清楚)。 I often remind engineers that the real troublemakers are not PCIe, UPI, or USB 3.0 — those high-speed differential signals are actually predictable and well-defined. 反而是像 I²C 這種一拉多的單端匯流排拓撲,更需要謹慎。 It’s the “slow” multi-drop, single-ended buses like I²C that require extra attention. 五種常見匯流排拓撲(書中 Figure 9.15) Five common bus topologies (Figure 9.15): (a) point to point — 點對點 (b) heavy point to point — 重型點對點 (c) daisy chain — 菊花鏈 (d) T topology — T 型拓撲 (e) star topology — 星型拓撲 📌 節錄 AI 對書中五種拓撲的深入說明 以下是五種拓撲在工程上的真正關鍵: Here are the key engineering concepts behind these five topologies: 1. 短截線(Stub)與訊號反射 Stubs and Signal Reflection 任何分支都是一個 stub,而 stub 會造成反射,尤其在上升時間接近 stub 較大延遲時。 Any branch becomes a stub, and stubs cause reflections—especially when the rise time becomes comparable to the stub delay. 2. 阻抗不連續(Impedance Discontinuity) 拓撲越複雜、分支越多,阻抗越難保持一致。 More branches = more impedance 3. 對稱性(Symmetry) 星型拓撲只能在負載與走線完全對稱下才有良好效能。 Star topology works only if loads and trace lengths are symmetrical. 4. 電容負載(Capacitive Loading) 多載具、長匯流排會導致有效阻抗下降、訊號邊緣變慢。 Multiple loads and long buses lower effective impedance and slow down edges. 5. 佈局研究的重要性(This is why routing study matters) 所有這些效應都會在 routing study 階段被放大或被消除。 Layout can either solve the problem or create it. 📌 AI與我的結語 AI&my Conclusion Routing Study 不是畫線,而是尋求最佳解法。 Routing Study is not about “drawing lines”, but about finding the optimal solution. 理解拓撲、理解負載、理解 stub、理解不連續性,你的產品才能真正游刃有餘。 Understand topology, loads, stubs, and discontinuities — and your design will operate effortlessly. **本文章經過AI修正
3
-
Andreas Olofsson
Zero ASIC • 21K followers
Once upon a time, every chip vendor had their own compiler(s). I once worked for a company that had 4 different architectures and 4+1 different compilers! Now we have GCC and LLVM for CPUs. Personally, I am going to do everything I can to make sure we finally get a proper "GCC for FPGAs". Well....standard open source FPGA tooling just took a small but important step forward! Alexandre Singer has completed his awesome summer project ar Zero ASIC, enabling production-grade OpenSTA based static timing analysis on post routed netlists exported from VPR. Check out his blog post explaining how it was done and why this is a big deal. https://lnkd.in/eugBNStN
488
23 Comments -
Chandra Sekhar Mallela
10K followers
1/2 (2nd one in comments) : As HW systems architects, we should invariably be familiar with the Firmware(FW usually in Layer1&2 of OSI)/Middleware (MW - usually in Layer 3 & above but not application layer which uses these functions) functions that interact with the underlying hardware so the sw/hw interaction is efficient and the underlying hardware is 100% utilized. One such low-level MW functions I really appreciate are libibverbs (Infiniband verbs for RDMA) - which have stood the test of time and are used by high-level MW functions interfacing with the applications. Some of such high-level MWs are NCCL (Nvidia Collective Communications Library), openMPI (Message Passing Interfacing for distributed memory model among the servers in the DC/cloud) and openSHMEM (SHared MEMory for shared memory model among the servers in the DC/cloud). I want to bring a few points to your attention though its known wisdom for the people breathing in RDMA day in & day out :: trust me I know a few 😀 !! RDMA's 'libibverbs' has one-way functions of RDMA write and read (for shared memory model) and two-way functions of RDMA send and receive (for distributed memory model). As the name suggests, one-way functions ensure that the receiver server is not involved in the data transfer, whereas two-way functions involve both transmitting host and the receiving host during the data transfer. It gets more interesting when we observe the alternate names of one-way functions as zero-copy functions and two-way functions as single-copy functions. It should make us - the HW systems architects almost insomniacs as to where the hell, this single-copy is coming from, while the RDMA is broadly supposed to achieve the zero-copy transfers 😁 !! It turns out that in RDMA two-way functions (RDMA send & receive), the data transfer occurs this way. Sender :: application layer buffer --> Kernel buffer (though stack processing is bypassed & done in NIC) --> NIC HW buffer --> data transfer over network --> Receiver :: NIC HW buffer --> Kernel buffer --> application layer buffer. The single copy (one location to another location within DRAM) that the two-way functions refer to is :: application layer buffer --> Kernel buffer on the sender side or Kernel buffer --> application layer on the receiving host side. If we avoid it, it reduces to one-way function (RDMA write & read). We might have heard of the optimization of the RDMA send & receive verbs/functions : nothing but bringing the two-way closer to one-way by directly transferring from application buffer in the DRAM to NIC's HW buffer.
22
4 Comments -
Neeraj Mishra
Birla Institute of Technology… • 30K followers
🎯 Part 3: Compensation Done Right – When Theory Meets Tapeout “Every pole wants to win. Compensation teaches them teamwork.” Designing analog circuits at scale isn’t about “knowing” compensation – it’s about applying it smartly across different architectures, technologies, and PVT corners. Let’s decode each dimension of smart compensation. ⸻ 🔍 1. Where Do Poles Come From? Before you compensate anything, ask: • What are your dominant capacitors? • Who’s contributing slow dynamics? • Where’s the gain drop-off happening? 🧠 Example: • In a two-stage opamp, the 1st pole is at the output of the first stage, the 2nd pole is at the load of the output stage. • If you ignore that, you’ll get oscillations even with 50° phase margin on paper. ⸻ 📐 2. Design Checklist Before Applying Compensation ✅ Estimate open-loop gain and phase ✅ Identify all poles/zeros up to 3 GHz ✅ Run small-signal AC + transient sims ✅ Consider loading (pads, switches, next stages) ✅ Ask: “Can my compensation survive corners + mismatch?” 📎 Golden Rule: Always close the loop in simulation before you add compensation. Uncompensated behavior is your compass. ⸻ 🛠️ 3. Miller Compensation Design Tips • Use a feedback capacitor (Cm) between the output and intermediate node. • Size Cm such that first pole << second pole. • Add a series resistor (nulling resistor) to move the RHP zero back to left-half plane. 📏 Rule of thumb: Cm = 1/gm of 2nd stage → yields decent 45–60° phase margin ⸻ 🔀 4. Nested Miller for 3-stage Design 3-stage opamps often suffer from: • Extra pole • Inter-stage loading • Long tail swing paths 🧩 Solution: • Nest 2 compensation caps: one between stage 2 and 1, another from stage 3 to 1. • Add gm boosters or buffers to manage parasitics. 📌 Best used in: • Pipeline ADC opamps • High-gain regulators with PVT swings ⸻ ⚡ 5. Feedforward is Great, But Use with Caution Feedforward is tempting for: • Slew boost • Fast settling in switched-cap circuits But: • You lose DC gain • It can increase peaking in frequency response 💡 Smart Trick: Use common-mode feedforward — helps in differential architectures without hurting differential-mode gain. ⸻ 🧱 6. Ahuja (Cascode) Buffers Are Clean Fixes When layout symmetry and parasitic matching are critical (think high-resolution SAR ADC drivers), Ahuja saves the day: ✔️ Maintains high gain ✔️ Avoids RHP zero ✔️ Keeps layout predictable 📍 Tip: Place the buffer carefully in layout to minimize mismatch paths. ⸻ 🔁 7. Simulate Your Stability – But Do It Right Too many designers: • Skip phase margin checks in corners • Ignore load-dependent poles • Only simulate at TT, no mismatch corners 🔍 Must check: • AC response • Phase margin vs load cap sweep • Monte Carlo on RC extraction ⸻ 🚧 Common Pitfalls ❌ Overcompensating → slow response ❌ Forgetting layout parasitics → wrong phase margin ❌ Relying only on AC sims → transient may tell a different story
76
4 Comments -
Ben Cohen
VhdlCohen Publishing • 9K followers
SV-Perplexity PRO/labs: traffic light controller study (link below) Requirements given for clarification, SVA writing, RTL and TB generation, and a complete report. I did not simulate, but the task took about one hour (labs mode is slow). I found the exercise useful if I were a design/Verification engineer as it clarified the understanding of the requirements and provided a good baseline for the total task, including documentation. https://lnkd.in/gnthhGGJ SystemVerilog.us
16
4 Comments -
Dr. Sanjay Ahuja
Cionlabs • 25K followers
The RISC-V Revolution: How Open-Source Chip Architecture is Democratizing IoT Hardware in India For decades, the semiconductor industry has operated on a simple, restrictive premise: if you wanted to build a custom chip, you needed a billion-dollar budget and a license from a handful of foreign architects. The Instruction Set Architecture (ISA), the fundamental language of a processor, was a locked door. That door has been blown off its hinges by RISC-V. For senior executives in India’s electronics and IoT space, this is not just a technical footnote. It is a fundamental shift in the balance of power. It represents the single greatest opportunity to move from being assemblers of foreign technology to creators of indigenous intellectual property. At Cionlabs, we have been watching this revolution unfold in real-time. The year 2026 is shaping up to be the moment the Indian RISC-V ecosystem moves from promise to production. What is RISC-V and Why Should a Business Leader Care? RISC-V is an open-standard ISA, meaning it is free for anyone to use, modify, and build upon without paying royalties to a single company like Arm or Intel. For a business leader, the implications are threefold: Freedom from Vendor Lock-in: You are no longer chained to the roadmap of a foreign supplier. If you own the RISC-V IP, you control your product’s destiny. Radical Cost Reduction: By eliminating licensing fees and leveraging open-source cores, the barrier to creating custom silicon (SoCs) drops dramatically. True Customization (Domain-Specific Architecture): You can design a chip that is perfect for your specific application, be it a low-power sensor or an AI camera—rather than forcing your product to fit a generic, off-the-shelf chip. The Indian Express: DIR-V and the Sovereign Silicon Drive The Indian government and research institutions have placed a massive bet on RISC-V. The Digital India RISC-V (DIR-V) initiative is the cornerstone of this strategy, aiming to make India a global producer of open-source hardware. The vision, articulated by leaders like IIT Madras Director Prof. V. Kamakoti, is clear: by leveraging RISC-V, Indian startups can develop efficient, domain-specific System-on-Chips (SoCs) for AI, IoT, and high-performance computing, fueling the “Make in India” and “Digital India” missions. This isn’t just academic theory. The ecosystem is maturing rapidly. At the recent VLSI Design Conference 2026 in Pune, C-DAC showcased live demonstrations of their indigenous VEGA processors powering real IoT applications, alongside the ARIES development boards. The building blocks for “Made in India” chips are here. The Tipping Point: VIHAAN-I and the Aheesa Breakthrough Theory became reality in February 2026. Indian fabless startup Aheesa Digital Innovations announced the tape-out of VIHAAN-I, contd... Read the complete article: https://lnkd.in/gVNmwgiX
7
-
Shivraj Dharne
HCLTech • 16K followers
Why we have to worry about setup and hold time in flip flops? Setup and hold times are critical timing parameters in flip-flops because they ensure reliable and predictable data capture. Violating these constraints can lead to incorrect data being latched, causing functional errors in digital circuits. Here’s a clear breakdown: ⸻ 🔧 What are Setup and Hold Times? • Setup Time (Tsetup): The minimum time before the clock edge that the data input (D) must be stable. • Hold Time (Thold): The minimum time after the clock edge that the data input must remain stable. ⸻ 🔍 Why Do We Worry About Them? 1. To Ensure Reliable Data Capture Flip-flops work by sampling the data at the rising or falling clock edge. If data changes too close to the clock edge: • The flip-flop may enter a metastable state (undefined or oscillating output). • It may capture incorrect data. • This can propagate errors through the rest of the system. 2. Prevent Metastability When setup or hold times are violated: • The internal circuitry may not resolve cleanly to a ‘1’ or ‘0’. • Metastability can persist for unpredictable durations, leading to random system behavior. 3. Timing Closure in Design In digital design (ASIC/FPGA): • Static Timing Analysis (STA) checks all paths to meet setup and hold constraints. • Violations can prevent the chip from working at the desired clock frequency. ⸻ 🧠 Real-world Analogy: Imagine taking a photograph with a camera: • Setup time: The subject must be still before you press the shutter. • Hold time: The subject must remain still immediately after the shutter clicks. If they move during either, you get a blurry photo — the flip-flop’s output is like that blurry photo: unusable. In Summary Setup Time => Data Must be stable Before clock edge … If violated… Data may not be captured correctly Hold Time=> Data Must be stable After clock edge If violated… May cause metastability or wrong data So, we worry about setup and hold times to guarantee proper functionality, avoid timing violations, and ensure robust digital system performance.
73
1 Comment -
Real Intent
3K followers
Short highlights video on 5 metrics for high testability using DFT static sign-off. Real Intent’s Kanad Chakraborty presents: - Comprehensive, fine-grained rules for async resets, clocks & data connectivity - Applying multiple constraint sets & rules to cover all targeted test modes - Grouping violations by root cause To learn about Real Intent Meridian DFT, visit: https://lnkd.in/gM-MKsut
81
4 Comments -
Leo Joseph
NnN Net Solutions • 103 followers
Paper Title: Dominant Block Guided Optimal Cache Size Estimation to Maximize IPC of Embedded Software Authors: Rajendra Patel and Arvind Rajawat, Maulana Azad National Institute of Technology, India Abstract: Embedded system software is highly constrained from performance, memory footprint, energy consumption and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC). Instruction cache has major contribution in improving IPC. Cache memories are realized on the same chip where the processor is running. This considerably increases the system cost as well. Hence, it is required to maintain a trade-off between cache sizes and performance improvement offered. Determining the number of cache lines and size of cache line are important parameters for cache designing. The design space for cache is quite large. It is time taking to execute the given application with different cache sizes on an instruction set simulator (ISS) to figure out the optimal cache size. In this paper, a technique is proposed to identify a number of cache lines and cache line size for the L1 instruction cache that will offer best or nearly best IPC. Cache size is derived, at a higher abstraction level, from basic block analysis in the Low Level Virtual Machine (LLVM) environment. The cache size estimated from the LLVM environment is cross validated by simulating the set of benchmark applications with different cache sizes in SimpleScalar’s outof-order simulator. The proposed method seems to be superior in terms of estimation accuracy and/or estimation time as compared to the existing methods for estimation of optimal cache size parameters (cacheline size, number of cache lines). KEYWORDS Optimal Cache Size, Embedded Software, Design Space Exploration, Performance Estimation, Dominant Block Volume URL: https://lnkd.in/gYQvF6Kr Pdf URL: https://lnkd.in/g49FFtQJ #Audio #AC97 #controller #Embedded #system #FPGA #MicroBlaze #Power #consumption #System #on #Chip #SoC #OpenCores #OpenRISC #researchpapers #cfp #researchers #phdstudent #education #learning #online #researchscholar #journalpaper #submission #journalsubmission #engineeringexcellence #techcommunity #devops #agilemethodology
1
-
Shashank Sharma
NVIDIA • 3K followers
After Himalayan efforts of two years, AMDGPU Usermode queue code has finally been merged in mainline Linux kernel. Usermode queue is an new experimental framework which allows a Userspace driver (like MESA-GL or Vulkan driver) or a privileged Graphics application to submit the GPU workload packets directly to the GPU FW, bypassing the kernel CS IOCTL and DRM GPU scheduler. This method of workload submission is probably going to be new normal soon. Some of the basic patches from the series: https://lnkd.in/e3h_bvgv https://lnkd.in/e_2GpH_c Phoronix coverage (one of a few) of the pull request: https://lnkd.in/eZz6UkeP Equal credits to the co-author Arvind Yadav and enabler Srinath Rao for this achievement.
275
31 Comments -
Palash Khandale
Arm • 4K followers
🧩 RTL Partitioning – Why Control Path Stays Hardcoded When partitioning RTL into Config, Control, and Data paths, we often preach scalability through parameterization. But here’s a counterintuitive truth: 👉 Control path is rarely parameterized. And that’s not bad practice — it’s deliberate design. 🧠 Let’s decode why: Debug Simplicity Wins A reg [1:0] is easier to trace than reg [(WIDTH*2)-1:0]. In Control logic, human readability often trumps abstraction. Control Evolves, Data Doesn’t While data paths (algorithms, packets) are fixed early and highly reusable, control logic (FSMs, IRQs, masks) changes frequently — sometimes up to tapeout. Time-to-Signoff Pressure Parameterized control adds debug friction. Teams favor hardcoded control to ensure faster closure on LEC, CDC, and simulations. Security & Compatibility Control tweaks may mask features for SW compatibility or silicon security. Parameterization here can risk exposing unintended behavior. 🔍 RTL Partitioning Heuristic: Config Path → SFRs, APB/AHB reg maps Control Path → FSMs, IRQs, bandwidth throttles Data Path → Processing logic, FIFO paths, algo pipes And while Data Paths are parameterized for reuse… Control Paths are fixed to stay flexible. 💡 Smart designers don’t over-engineer what needs to change. They make room for evolution — RTL partition with intent. 💬 Not sure how this connects to your work? Book a free 1:1 call and let’s discuss your exact RTL design situation: 👉 https://lnkd.in/gVs3JkQb 📣 Want deeper clarity on real-world RTL roles? I’m conducting a live webinar where we decode: 🔹 What your manager actually expects 🔹 Why tools and flows dominate early careers 🔹 How to align with industry reality and stand out 👉 Join here (100% proceeds go to charity): https://lnkd.in/g5DYhhNw #ASICDecoded #RTLDesign #RTLPartitioning #SoC #VLSI #Semiconductor #CareerClarity #FPGA #DesignMindset #TapeoutTales
31
6 Comments -
Sanjay Adhikari
Embedkari Systems(OPC) Pvt… • 34K followers
Today we have to start a new STM32 baremetal batch. Premium community members will learn the TI Cortex M4 board also in parallel during weekends. Few members are joining Linux Application development from the 26th. Both classes will be at 8:45PM , so these are suitable for working professionals as well. Many members used to join 8AM IST C classes on Mon, Wed and Thursday. Few members participate in Embedded C experiments at 7:15PM. The C, firmware and Linux foundation help in further embedded software learning such as RTOS. Experienced folks join Embedded C++ classes as well. Few Ultra and Magnum kit members are working on Linux kernel development. We are using Yocto for this activity. Last weekend, we discussed Power electronics with Prof. Divya. Have a nice day ! #embedkari #embedded #opentowork
5
-
Francis Benistant
Technology Modeling Automation • 3K followers
#LLLM #CodeGeneration #ML #AI #NLP #SoftwareEngineering #DeepLearning #Transformer #CodeLLM #NL2Code #SurveyPaper #CodeSynthesis #PromptEngineering #ReinforcementLearning #DataCuration #ModelArchitecture #BenchmarkEvaluation The survey provides a systematic literature review of LLMs for code generation, focusing on the natural-language-to-code (NL2Code) task. The authors introduce a taxonomy to categorize recent developments, covering data curation, model architectures, training techniques, evaluation methods, and real-world applications. They examine the evolution from earlier rule-based approaches to modern Transformer-based LLMs, highlighting the remarkable performance improvements achieved in recent years. The paper presents empirical comparisons using benchmarks like HumanEval, MBPP, and BigCodeBench to demonstrate the progressive enhancements in LLM capabilities for code generation. For instance, performance on HumanEval has improved from 3.6% (PaLM 8B) to 95.1% (LDB) on Pass@1 metrics. The authors also identify critical challenges and opportunities regarding the gap between academic research and practical development. Key innovations and contributions: · A systematic taxonomy categorizing the complete lifecycle of code LLMs · Comprehensive review of data curation and synthesis techniques for code generation · Analysis of model architectures (encoder-decoder vs. decoder-only) for code generation · Examination of instruction tuning approaches (full parameter vs. parameter-efficient) · Review of reinforcement learning with feedback methods for code quality improvement · Exploration of prompting techniques for iterative and self-improving code generation · Investigation of repository-level and retrieval-augmented code generation · Discussion of autonomous coding agents powered by LLMs The survey addresses several research issues, including the scarcity of high-quality data for training code LLMs, the challenges of evaluating code generation capabilities in real-world scenarios, and the gap between academic benchmarks and practical development needs. It also examines ethical implications and environmental impacts of using LLMs for code generation. Despite its comprehensive nature, the survey has some limitations. It primarily focuses on English-language papers and may not fully capture research published in other languages or regions. Additionally, due to the rapidly evolving nature of the field, some very recent developments may not be fully captured, despite the authors' efforts to include papers up to 2024. The paper establishes a framework for understanding and categorizing research in code LLMs, which can guide future theoretical developments and practical applications. It also highlights real-world applications such as GitHub Copilot, CodeGeeX, and Amazon CodeWhisperer, demonstrating the practical significance of this research area. https://lnkd.in/gAV6Gpe3
-
Mary Bennion
Arm • 3K followers
Finally!! ExecuTorch 1.0 is here - bringing a unified PyTorch workflow to billions of Arm-based edge devices. 🎉 For those building at the edge, that means: 🚀 Faster, simpler development and deployment 📱 Greater reach for apps and workloads ⚡ Higher performance and efficiency across Arm CPUs, GPUs, and NPUs Check out our blog for details on Arm KleidiAI AI, CMSIS-NN, and TOSA integrations in ExecuTorch, making it easier than ever for developers to bring high-performance AI to life 🙌🏼
68
-
Prakash Rashinkar
3K followers
*** Cross-die links introduce skew, coherence issues, and latency that don’t appear in monolithic SoCs. *** Chapter 5 of my book on chiplets details die-to-die interconnects (UCIe, BoW, AIB, and others) and the hooks needed to validate them. Now available on Amazon in hardcover, paperback, and Kindle. Links in the comments.
19
1 Comment
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content