Publications

paper · 2026

177 - JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Zhenyu Bi, Yang Li, Meng Lu, Swastik Roy, Gaurav Srivastava, Xuan Wang, Morteza Ziyadi · Underline Science Inc.

While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). Prior work on LLM-as-a-judge frameworks typically relies on comparing candidate answers against ground-truth labels or other candidate answers using predefined metrics like entailment. However, this approach is inherently indirect and difficult to fully automate, offering limited support for fine-grained and scalable evaluation of reasoning outputs. In this work, we propose JudgeBoard, a novel evaluation pipeline that directly queries models to assess the correctness of candidate answers without requiring extra answer comparisons. We focus on two core reasoning domains: mathematical reasoning and science/commonsense reasoning, and construct task-specific evaluation leaderboards using both accuracy-based ranking and an Elo-based rating system across five benchmark datasets, enabling consistent model comparison as judges rather than comparators. To improve judgment performance in lightweight models, we propose MAJ (Multi-Agent Judging), a novel multi-agent evaluation framework that leverages multiple interacting SLMs with distinct reasoning profiles to approximate LLM-level judgment accuracy through collaborative deliberation. Experimental results reveal a significant performance gap between SLMs and LLMs in isolated judging tasks. However, our MAJ framework substantially improves the reliability and consistency of SLMs. On the MATH dataset, MAJ using smaller-sized models as backbones performs comparatively well or even better than their larger-sized counterparts. Our findings highlight that multi-agent SLM systems can potentially match or exceed LLM performance in judgment tasks, with implications for scalable and efficient assessment.

DOI SourceCitations: placeholder

paper · 2026

JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Zhenyu Bi, Gaurav Srivastava, Yang Li, Swastik Roy, Meng Lu, Morteza Ziyadi, Xuan Wang · Proceedings of the AAAI Conference on Artificial Intelligence

While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). Prior work on LLM-as-a-judge frameworks typically relies on comparing candidate answers against ground-truth labels or other candidate answers using predefined metrics like entailment. However, this approach is inherently indirect and difficult to fully automate, offering limited support for fine-grained and scalable evaluation of reasoning outputs. In this work, we propose JudgeBoard, a novel evaluation pipeline that directly queries models to assess the correctness of candidate answers without requiring extra answer comparisons. We focus on two core reasoning domains: mathematical reasoning and science/commonsense reasoning, and construct task-specific evaluation leaderboards using both accuracy-based ranking and an Elo-based rating system across five benchmark datasets, enabling consistent model comparison as judges rather than comparators. To improve judgment performance in lightweight models, we propose MAJ (Multi-Agent Judging), a novel multi-agent evaluation framework that leverages multiple interacting SLMs with distinct reasoning profiles to approximate LLM-level judgment accuracy through collaborative deliberation. Experimental results reveal a significant performance gap between SLMs and LLMs in isolated judging tasks. However, our MAJ framework substantially improves the reliability and consistency of SLMs. On the MATH dataset, MAJ using smaller-sized models as backbones performs comparatively well or even better than their larger-sized counterparts. Our findings highlight that multi-agent SLM systems can potentially match or exceed LLM performance in judgment tasks, with implications for scalable and efficient assessment.

DOI SourceCitations: placeholder

paper · 2026

JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Zhenyu Bi, Yang Li, Meng Lu, Swastik Roy, Gaurav Srivastava, Xuan Wang, Morteza Ziyadi · Underline Science Inc.

While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). Prior work on LLM-as-a-judge frameworks typically relies on comparing candidate answers against ground-truth labels or other candidate answers using predefined metrics (like entailment). However, this approach is inherently indirect and difficult to fully automate, offering limited support for fine-grained and scalable evaluation of reasoning outputs. In this work, we propose JudgeBoard, a novel evaluation pipeline that directly queries models to assess the correctness of candidate answers without relying on gold-standard labels. We focus on two core reasoning domains, math and science/commonsense reasoning, and construct task-specific evaluation leaderboards using both accuracy ranking and an Elo-based rating system across five benchmark datasets, enabling consistent model comparison as judges rather than comparators. To improve judgment performance in lightweight models, we propose MAJ (Multi-Agent Judging), a novel Elo-based multi-agent evaluation framework that leverages multiple interacting SLMs to approximate LLM-level judgment accuracy. Experimental results show a clear performance gap between SLMs and LLMs in isolated judging tasks. However, our MAJ framework substantially improves the reliability and consistency of the SLMs. On the MATH dataset, MAJ framework using smaller-sized models as backbones could perform comparatively well or even better than their larger-sized counterparts. Our findings highlight that multi-agent SLM systems can potentially match or exceed LLM performance in judgment tasks, with implications for scalable assessment using SLMs.

DOI SourceCitations: placeholder

preprint · 2025

BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models

Gaurav Srivastava, Aafiya Hussain, Zhenyu Bi, Swastik Roy, Priya Pitre, Meng Lu, Morteza Ziyadi, Xuan Wang · ArXiv.org

Evaluating language models fairly is increasingly difficult as static benchmarks risk contamination by training data, obscuring whether models truly reason or recall. We introduce BeyondBench, an evaluation framework using algorithmic problem generation to create mathematically grounded problems on the fly, ensuring each test remains uncontaminated. Our framework covers 44 algorithmic tasks with 117 variations across three difficulty levels: the Easy Suite (29 tasks) for arithmetic and statistics, the Medium Suite (5 tasks, 49 variations) for sequence patterns and reasoning, and the Hard Suite (10 tasks, 68 variations) for NP-complete and constraint satisfaction problems. Each task draws from a space exceeding 10^15 unique instances, with deterministically verified solutions. We evaluated 101 language models (85 open-source, 16 closed-source), spanning 0.5B to 141B parameters and multiple quantization schemes, using three-fold evaluation for robustness. Results reveal consistent reasoning deficiencies, with performance degrading sharply as complexity increases. In Hard Suite evaluations, Gemini-2.5-pro, Llama-3.3-70B, and Qwen2.5-72B achieved accuracies of 56.21%, 27.16%, and 33.37% respectively. Performance drops significantly without tool usage, with GPT-5, GPT-5-mini, and GPT-5-nano showing declines of 16.81%, 15.86%, and 43.95% in overall accuracy. Contamination resistance rests on three guarantees: (i) the problem space vastly exceeds any static dataset, (ii) every instance has a deterministically verifiable solution, and (iii) isomorphic transformations yield semantically equivalent but syntactically novel problems. BeyondBench redefines reasoning evaluation via genuine algorithmic problem-solving. Our leaderboard is at https://ctrl-gaurav.github.io/BeyondBench/, Python package at https://pypi.org/project/beyondbench/, and codebase at https://github.com/ctrl-gaurav/BeyondBench.

DOI SourceCitations: placeholder

preprint · 2025

JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Zhenyu Bi, Gaurav Srivastava, Yang Li, Meng Lu, Swastik Roy, Morteza Ziyadi, Xuan Wang · ArXiv.org

While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). Prior work on LLM-as-a-judge frameworks typically relies on comparing candidate answers against ground-truth labels or other candidate answers using predefined metrics like entailment. However, this approach is inherently indirect and difficult to fully automate, offering limited support for fine-grained and scalable evaluation of reasoning outputs. In this work, we propose JudgeBoard, a novel evaluation pipeline that directly queries models to assess the correctness of candidate answers without requiring extra answer comparisons. We focus on two core reasoning domains: mathematical reasoning and science/commonsense reasoning, and construct task-specific evaluation leaderboards using both accuracy-based ranking and an Elo-based rating system across five benchmark datasets, enabling consistent model comparison as judges rather than comparators. To improve judgment performance in lightweight models, we propose MAJ (Multi-Agent Judging), a novel multi-agent evaluation framework that leverages multiple interacting SLMs with distinct reasoning profiles to approximate LLM-level judgment accuracy through collaborative deliberation. Experimental results reveal a significant performance gap between SLMs and LLMs in isolated judging tasks. However, our MAJ framework substantially improves the reliability and consistency of SLMs. On the MATH dataset, MAJ using smaller-sized models as backbones performs comparatively well or even better than their larger-sized counterparts. Our findings highlight that multi-agent SLM systems can potentially match or exceed LLM performance in judgment tasks, with implications for scalable and efficient assessment.

DOI SourceCitations: placeholder

preprint · 2025

OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning

Zhenyu Bi, Meng Lu, Yang Li, Swastik Roy, Weijie Guan, Morteza Ziyadi, Xuan Wang · ArXiv.org

Large Language Models (LLMs) have shown remarkable reasoning capabilities in mathematical and scientific tasks. To enhance complex reasoning, multi-agent systems have been proposed to harness the collective intelligence of LLM agents. However, existing collaboration structures are either predefined or rely on majority voting or round-table debates, which can suppress correct but less dominant agent contributions. Recent approaches model multi-agent systems as graph networks but optimize purely for agent performance, neglecting the quality of interactions. We hypothesize that effective agent communication is crucial for multi-agent reasoning and that debating quality plays a significant role. To address this, we propose $\ours$, a multi-agent verbal reinforcement learning algorithm that dynamically constructs and refines multi-agent collaboration structures. Our method defines action spaces and a feedback mechanism that evaluates communication robustness and coherence throughout the debate. The final decision is achieved through a majority vote over all the agents. We assess $\ours$ on various reasoning tasks, including mathematical reasoning, creative writing, scientific reasoning, and numerical sorting. Results demonstrate that our approach significantly outperforms single-agent prompting methods and state-of-the-art multi-agent frameworks on diverse tasks.

DOI SourceCitations: placeholder

workshop · 2025

OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning

Zhenyu Bi, Meng Lu, Yang Li, Swastik Roy, Weijie Guan, Morteza Ziyadi, Xuan Wang · Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Zhenyu Bi, Meng Lu, Yang Li, Swastik Roy, Weijie Guan, Morteza Ziyadi, Xuan Wang. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2025.

DOI SourceCitations: placeholder

preprint · 2025

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

Salman Rahman, Sruthi Gorantla, Arpit Gupta, Swastik Roy, Nanyun Peng, Yang Liu · ArXiv.org

Process reward models (PRMs) that provide dense, step-level feedback have shown promise for reinforcement learning, yet their adoption remains limited by the need for expensive step-level annotations or ground truth references. We propose SPARK: a three-stage framework where in the first stage a generator model produces diverse solutions and a verifier model evaluates them using parallel scaling (self-consistency) and sequential scaling (meta-critique). In the second stage, we use these verification outputs as synthetic training data to fine-tune generative process reward models, which subsequently serve as reward signals during training. We show that aggregating multiple independent verifications at the step level produces training data for process reward models that surpass ground-truth outcome supervision, achieving 67.5 F1 on ProcessBench (a benchmark for identifying erroneous steps in mathematical reasoning) compared to 66.4 for reference-guided training and 61.9 for GPT-4o. In the final stage, we apply our generative PRM with chain-of-thought verification (PRM-CoT) as the reward model in RL experiments on mathematical reasoning, and introduce format constraints to prevent reward hacking. Using Qwen2.5-Math-7B, we achieve 47.4% average accuracy across six mathematical reasoning benchmarks, outperforming ground-truth-based RLVR (43.9%). Our work enables reference-free RL training that exceeds ground-truth methods, opening new possibilities for domains lacking verifiable answers or accessible ground truth.

DOI SourceCitations: placeholder

preprint · 2025

The Amazon Nova Family of Models: Technical Report and Model Card

Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue, Alan He, Alessandra Cervone, Alex Loeb, Alex Zhang, Alexander Fu, Alexander Lisnichenko, Alexander Zhipa, Alexandros Potamianos, Ali Kebarighotbi, Aliakbar Daronkolaei, Alok Parmesh, Amanjot Kaur Samra, Ameen Khan, Amer Rez, Amir Saffari, Amit Agarwalla, Amit Jhindal, Amith Mamidala, Ammar Asmro, Amulya Ballakur, Anand Mishra, Anand Sridharan, Anastasiia Dubinina, Andre Lenz, Andreas Doerr, Andrew Keating, Andrew Leaver, Andrew Smith, Andrew Wirth, Andy Davey, Andy Rosenbaum, Andy Sohn, Angela Chan, Aniket Chakrabarti, Anil Ramakrishna, Anirban Roy, Anita Iyer, Anjali Narayan-Chen, Ankith Yennu, Anna Dabrowska, Anna Gawlowska, Anna Rumshisky, Anna Turek, Anoop Deoras, Anton Bezruchkin, Anup Prasad, Anupam Dewan, Anwith Kiran, Apoorv Gupta, Aram Galstyan, Aravind Manoharan, Arijit Biswas, Arindam Mandal, Arpit Gupta, Arsamkhan Pathan, Arun Nagarajan, Arushan Rajasekaram, Arvind Sundararajan, Ashwin Ganesan, Ashwin Swaminathan, Athanasios Mouchtaris, Audrey Champeau, Avik Ray, Ayush Jaiswal, Ayush Sharma, Bailey Keefer, Balamurugan Muthiah, Beatriz Leon-Millan, Ben Koopman, Ben Li, Benjamin Biggs, Benjamin Ott, Bhanu Vinzamuri, Bharath Venkatesh, Bhavana Ganesh, Bhoomit Vasani, Bill Byrne, Bill Hsu, Bincheng Wang, Blake King, Blazej Gorny, Bo Feng, Bo Zheng, Bodhisattwa Paul, Bofan Sun, Bofeng Luo, Bowen Chen, Bowen Xie, Boya Yu, Brendan Jugan, Brett Panosh, Brian Collins, Brian Thompson, Can Karakus, Can Liu, Carl Lambrecht, Carly Lin, Carolyn Wang, Carrie Yuan, Casey Loyda, Cezary Walczak, Chalapathi Choppa, Chandana Satya Prakash, Chankrisna Richy Meas, Charith Peris, Charles Recaido, Charlie Xu, Charul Sharma, Chase Kernan, Chayut Thanapirom, Chengwei Su, Chenhao Xu, Chenhao Yin, Chentao Ye, Chenyang Tao, Chethan Parameshwara, Ching-Yun Chang, Chong Li, Chris Hench, Chris Tran, Christophe Dupuy, Christopher Davis, Christopher DiPersio, Christos Christodoulopoulos, Christy Li, Chun Chen, Claudio Delli Bovi, Clement Chung, Cole Hawkins, Connor Harris, Corey Ropell, Cynthia He, DK Joo, Dae Yon Hwang, Dan Rosen, Daniel Elkind, Daniel Pressel, Daniel Zhang, Danielle Kimball, Daniil Sorokin, Dave Goodell, Davide Modolo, Dawei Zhu, Deepikaa Suresh, Deepti Ragha, Denis Filimonov, Denis Foo Kune, Denis Romasanta Rodriguez, Devamanyu Hazarika, Dhananjay Ram, Dhawal Parkar, Dhawal Patel, Dhwanil Desai, Dinesh Singh Rajput, Disha Sule, Diwakar Singh, Dmitriy Genzel, Dolly Goldenberg, Dongyi He, Dumitru Hanciu, Dushan Tharmal, Dzmitry Siankovich, Edi Cikovic, Edwin Abraham, Ekraam Sabir, Elliott Olson, Emmett Steven, Emre Barut, Eric Jackson, Ethan Wu, Evelyn Chen, Ezhilan Mahalingam, Fabian Triefenbach, Fan Yang, Fangyu Liu, Fanzi Wu, Faraz Tavakoli, Farhad Khozeimeh, Feiyang Niu, Felix Hieber, Feng Li, Firat Elbey, Florian Krebs, Florian Saupe, Florian Sprünken, Frank Fan, Furqan Khan, Gabriela De Vincenzo, Gagandeep Kang, George Ding, George He, George Yeung, Ghada Qaddoumi, Giannis Karamanolakis, Goeric Huybrechts, Gokul Maddali, Gonzalo Iglesias, Gordon McShane, Gozde Sahin, Guangtai Huang, Gukyeong Kwon, Gunnar A. Sigurdsson, Gurpreet Chadha, Gururaj Kosuru, Hagen Fuerstenau, Hah Hah, Haja Maideen, Hajime Hosokawa, Han Liu, Han-Kai Hsu, Hann Wang, Hao Li, Hao Yang, Haofeng Zhu, Haozheng Fan, Harman Singh, Harshavardhan Kaluvala, Hashim Saeed, He Xie, Helian Feng, Hendrix Luo, Hengzhi Pei, Henrik Nielsen, Hesam Ilati, Himanshu Patel, Hongshan Li, Hongzhou Lin, Hussain Raza, Ian Cullinan, Imre Kiss, Inbarasan Thangamani, Indrayani Fadnavis, Ionut Teodor Sorodoc, Irem Ertuerk, Iryna Yemialyanava, Ishan Soni, Ismail Jelal, Ivan Tse, Jack FitzGerald, Jack Zhao, Jackson Rothgeb, Jacky Lee, Jake Jung, Jakub Debski, Jakub Tomczak, James Jeun, James Sanders, Jason Crowley, Jay Lee, Jayakrishna Anvesh Paidy, Jayant Tiwari, Jean Farmer, Jeff Solinsky, Jenna Lau, Jeremy Savareese, Jerzy Zagorski, Ji Dai, null Jiacheng, null Gu, Jiahui Li, null Jian, null Zheng, Jianhua Lu, Jianhua Wang, Jiawei Dai, Jiawei Mo, Jiaxi Xu, Jie Liang, Jie Yang, Jim Logan, Jimit Majmudar, Jing Liu, Jinghong Miao, Jingru Yi, Jingyang Jin, Jiun-Yu Kao, Jixuan Wang, Jiyang Wang, Joe Pemberton, Joel Carlson, Joey Blundell, John Chin-Jew, John He, Jonathan Ho, Jonathan Hueser, Jonathan Lunt, Jooyoung Lee, Joshua Tan, Joyjit Chatterjee, Judith Gaspers, Jue Wang, Jun Fang, Jun Tang, Jun Wan, Jun Wu, Junlei Wang, Junyi Shi, Justin Chiu, Justin Satriano, Justin Yee, Jwala Dhamala, Jyoti Bansal, Kai Zhen, Kai-Wei Chang, Kaixiang Lin, Kalyan Raman, Kanthashree Mysore Sathyendra, Karabo Moroe, Karan Bhandarkar, Karan Kothari, Karolina Owczarzak, Karthick Gopalswamy, Karthick Ravi, Karthik Ramakrishnan, Karthika Arumugam, Kartik Mehta, Katarzyna Konczalska, Kavya Ravikumar, Ke Tran, Kechen Qin, Kelin Li, Kelvin Li, Ketan Kulkarni, Kevin Angelo Rodrigues, Keyur Patel, Khadige Abboud, Kiana Hajebi, Klaus Reiter, Kris Schultz, Krishna Anisetty, Krishna Kotnana, Kristen Li, Kruthi Channamallikarjuna, Krzysztof Jakubczyk, Kuba Pierewoj, Kunal Pal, Kunwar Srivastav, Kyle Bannerman, Lahari Poddar, Lakshmi Prasad, Larry Tseng, Laxmikant Naik, Leena Chennuru Vankadara, Lenon Minorics, Leo Liu, Leonard Lausen, Leonardo F. R. Ribeiro, Li Zhang, Lili Gehorsam, Ling Qi, Lisa Bauer, Lori Knapp, Lu Zeng, Lucas Tong, Lulu Wong, Luoxin Chen, Maciej Rudnicki, Mahdi Namazifar, Mahesh Jaliminche, Maira Ladeira Tanke, Manasi Gupta, Mandeep Ahlawat, Mani Khanuja, Mani Sundaram, Marcin Leyk, Mariusz Momotko, Markus Boese, Markus Dreyer, Markus Mueller, Mason Fu, Mateusz Górski, Mateusz Mastalerczyk, Matias Mora, Matt Johnson, Matt Scott, Matthew Wen, Max Barysau, Maya Boumerdassi, Maya Krishnan, Mayank Gupta, Mayank Hirani, Mayank Kulkarni, Meganathan Narayanasamy, Melanie Bradford, Melanie Gens, Melissa Burke, Meng Jin, Miao Chen, Michael Denkowski, Michael Heymel, Michael Krestyaninov, Michal Obirek, Michalina Wichorowska, Michał Miotk, Milosz Watroba, Mingyi Hong, Mingzhi Yu, Miranda Liu, Mohamed Gouda, Mohammad El-Shabani, Mohammad Ghavamzadeh, Mohit Bansal, Morteza Ziyadi, Nan Xia, Nathan Susanj, Nav Bhasin, Neha Goswami, Nehal Belgamwar, Nicolas Anastassacos, Nicolas Bergeron, Nidhi Jain, Nihal Jain, Niharika Chopparapu, Nik Xu, Nikko Strom, Nikolaos Malandrakis, Nimisha Mishra, Ninad Parkhi, Ninareh Mehrabi, Nishita Sant, Nishtha Gupta, Nitesh Sekhar, Nithin Rajeev, Nithish Raja Chidambaram, Nitish Dhar, Noor Bhagwagar, Noy Konforty, Omar Babu, Omid Razavi, Orchid Majumder, Osama Dar, Oscar Hsu, Pablo Kvitca, Pallavi Pandey, Parker Seegmiller, Patrick Lange, Paul Ferraro, Payal Motwani, Pegah Kharazmi, Pei Wang, Pengfei Liu, Peter Bradtke, Peter Götz, Peter Zhou, Pichao Wang, Piotr Poskart, Pooja Sonawane, Pradeep Natarajan, Pradyun Ramadorai, Pralam Shah, Prasad Nirantar, Prasanthi Chavali, Prashan Wanigasekara, Prashant Saraf, Prashun Dey, Pratyush Pant, Prerak Pradhan, Preyaa Patel, Priyanka Dadlani, Prudhvee Narasimha Sadha, Qi Dong, Qian Hu, null Qiaozi, null Gao, Qing Liu, Quinn Lam, Quynh Do, R. Manmatha, Rachel Willis, Rafael Liu, Rafal Ellert, Rafal Kalinski, Rafi Al Attrach, Ragha Prasad, Ragini Prasad, Raguvir Kunani, Rahul Gupta, Rahul Sharma, Rahul Tewari, Rajaganesh Baskaran, Rajan Singh, Rajiv Gupta, Rajiv Reddy, Rajshekhar Das, Rakesh Chada, Rakesh Vaideeswaran Mahesh, Ram Chandrasekaran, Ramesh Nallapati, Ran Xue, Rashmi Gangadharaiah, Ravi Rachakonda, Renxian Zhang, Rexhina Blloshmi, Rishabh Agrawal, Robert Enyedi, Robert Lowe, Robik Shrestha, Robinson Piramuthu, Rohail Asad, Rohan Khanna, Rohan Mukherjee, Rohit Mittal, Rohit Prasad, Rohith Mysore Vijaya Kumar, Ron Diamant, Ruchita Gupta, Ruiwen Li, Ruoying Li, Rushabh Fegade, Ruxu Zhang, Ryan Arbow, Ryan Chen, Ryan Gabbard, Ryan Hoium, Ryan King, Sabarishkumar Iyer, Sachal Malick, Sahar Movaghati, Sai Balakavi, Sai Jakka, Sai Kashyap Paruvelli, Sai Muralidhar Jayanthi, Saicharan Shriram Mujumdar, Sainyam Kapoor, Sajjad Beygi, Saket Dingliwal, Saleh Soltan, Sam Ricklin, Sam Tucker, Sameer Sinha, Samridhi Choudhary, Samson Tan, Samuel Broscheit, Samuel Schulter, Sanchit Agarwal, Sandeep Atluri, Sander Valstar, Sanjana Shankar, Sanyukta Sanyukta, Sarthak Khanna, Sarvpriye Khetrapal, Satish Janakiraman, Saumil Shah, Saurabh Akolkar, Saurabh Giri, Saurabh Khandelwal, Saurabh Pawar, Saurabh Sahu, Sean Huang, Sejun Ra, Senthilkumar Gopal, Sergei Dobroshinsky, Shadi Saba, Shamik Roy, Shamit Lal, Shankar Ananthakrishnan, Sharon Li, Shashwat Srijan, Shekhar Bhide, Sheng Long Tang, Sheng Zha, Shereen Oraby, Sherif Mostafa, Shiqi Li, Shishir Bharathi, Shivam Prakash, Shiyuan Huang, Shreya Yembarwar, Shreyas Pansare, Shreyas Subramanian, Shrijeet Joshi, Shuai Liu, Shuai Tang, Shubham Chandak, Shubham Garg, Shubham Katiyar, Shubham Mehta, Shubham Srivastav, Shuo Yang, Siddalingesha D S, Siddharth Choudhary, Siddharth Singh Senger, Simon Babb, Sina Moeini, Siqi Deng, Siva Loganathan, Slawomir Domagala, Sneha Narkar, Sneha Wadhwa, Songyang Zhang, Songyao Jiang, Sony Trenous, Soumajyoti Sarkar, Soumya Saha, Sourabh Reddy, Sourav Dokania, Spurthideepika Sandiri, Spyros Matsoukas, Sravan Bodapati, Sri Harsha Reddy Wdaru, Sridevi Yagati Venkateshdatta, Srikanth Ronanki, Srinivasan R Veeravanallur, Sriram Venkatapathy, Sriramprabhu Sankaraguru, Sruthi Gorantla, Sruthi Karuturi, Stefan Schroedl, Subendhu Rongali, Subhasis Kundu, Suhaila Shakiah, Sukriti Tiwari, Sumit Bharti, Sumita Sami, Sumith Mathew, Sunny Yu, Sunwoo Kim, Suraj Bajirao Malode, Susana Cumplido Riel, Swapnil Palod, Swastik Roy, Syed Furqhan, Tagyoung Chung, Takuma Yoshitani, Taojiannan Yang, Tejaswi Chillakura, Tejwant Bajwa, Temi Lajumoke, Thanh Tran, Thomas Gueudre, Thomas Jung, Tianhui Li, Tim Seemman, Timothy Leffel, Tingting Xiang, Tirth Patel, Tobias Domhan, Tobias Falke, Toby Guo, Tom Li, Tomasz Horszczaruk, Tomasz Jedynak, Tushar Kulkarni, Tyst Marin, Tytus Metrycki, Tzu-Yen Wang, Umang Jain, Upendra Singh, Utkarsh Chirimar, Vaibhav Gupta, Vanshil Shah, Varad Deshpande, Varad Gunjal, Varsha Srikeshava, Varsha Vivek, Varun Bharadwaj, Varun Gangal, Varun Kumar, Venkatesh Elango, Vicente Ordonez, Victor Soto, Vignesh Radhakrishnan, Vihang Patel, Vikram Singh, Vinay Varma Kolanuvada, Vinayshekhar Bannihatti Kumar, Vincent Auvray, Vincent Cartillier, Vincent Ponzo, Violet Peng, Vishal Khandelwal, Vishal Naik, Vishvesh Sahasrabudhe, Vitaliy Korolev, Vivek Gokuladas, Vivek Madan, Vivek Subramanian, Volkan Cevher, Vrinda Gupta, Wael Hamza, Wei Zhang, Weitong Ruan, Weiwei Cheng, Wen Zhang, Wenbo Zhao, Wenyan Yao, Wenzhuo Ouyang, Wesley Dashner, William Campbell, William Lin, Willian Martin, Wyatt Pearson, Xiang Jiang, Xiangxing Lu, Xiangyang Shi, Xianwen Peng, Xiaofeng Gao, Xiaoge Jiang, Xiaohan Fei, Xiaohui Wang, Xiaozhou Joey Zhou, Xin Feng, Xinyan Zhao, Xinyao Wang, Xinyu Li, Xu Zhang, Xuan Wang, Xuandi Fu, Xueling Yuan, Xuning Wang, Yadunandana Rao, Yair Tavizon, Yan Rossiytsev, Yanbei Chen, Yang Liu, Yang Zou, Yangsook Park, Yannick Versley, Yanyan Zhang, Yash Patel, Yen-Cheng Lu, Yi Pan, null Yi-Hsiang, null Lai, Yichen Hu, Yida Wang, Yiheng Zhou, Yilin Xiang, Ying Shi, Ying Wang, Yishai Galatzer, Yongxin Wang, Yorick Shen, Yuchen Sun, Yudi Purwatama, null Yue, null Wu, Yue Gu, Yuechun Wang, Yujun Zeng, Yuncong Chen, Yunke Zhou, Yusheng Xie, Yvon Guy, Zbigniew Ambrozinski, Zhaowei Cai, Zhen Zhang, Zheng Wang, Zhenghui Jin, Zhewei Zhao, Zhiheng Li, Zhiheng Luo, Zhikang Zhang, Zhilin Fang, Zhiqi Bu, Zhiyuan Wang, Zhizhong Li, Zijian Wang, null Zimeng, Zishi Li · ArXiv.org

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.

DOI SourceCitations: placeholder

workshop · 2014

Some Studies on Utilization of Human Body Energy with Hybrid Piezoelectric and Thermoelectric Effect

Suhit Datta, Nihit Kumar Singh, Swastik Roy, Nitai Pal · Proceedings of The 2014 International Conference on Control, Instrumentation, Energy and Communication (CIEC)

This paper describes some study on the utilization of body energy with hybrid piezoelectric and thermoelectric effect, generated from normal human interactions. Electrical energy is generated by the use of piezoelectric, thermoelectric and vibration energy converters. The energy extracted is stored in a rechargeable lithium ion battery via a charger which is able to capture or transfer the energy to battery. Lead titanium zirconate is used as piezoelectric material as it has high curie point and coercive voltage. Keywords—piezoelectric; thermoelectric; temperature; energy harvesting; generation

DOI SourceCitations: placeholder