[13] viXra:2601.0134 [pdf] submitted on 2026-01-28 23:04:12
Authors: Taeho Jo
Comments: 6 Pages.
In this research, we propose the string vector based KNN variants, and apply them to the keyword extraction. The initial KNN version was previously modified into the string vector-based version, and the keyword extraction was mapped into a binary classification, to apply it. In this research, we mention the three KNN variants, in the case of the numerical vector-based versions: one where the selected nearest neighbors are discriminated by their similarities, one where the attributes are discriminated by their correlations with the categories, and one where the training examples are discriminate by their credits. In this research, the three KNN variants are modified into the string vector-based versions, as the approaches to the keyword extraction, as well as the initial KNN version. The goal of this research is to improve the keyword extraction performance by modifying them so.
Category: Artificial Intelligence
[12] viXra:2601.0117 [pdf] submitted on 2026-01-27 00:34:30
Authors: Taeho Jo
Comments: 6 Pages.
In this research, we propose and apply the table based KNN variants to the keyword extraction. The initial KNN version was previously modified into the table-based version and applied by mapping the keyword extraction into a binary classification. In this research, we mentioned the three KNN variants, in case of the numerical vector-based versions: one where the selected nearest neighbors are discriminated by their similarities, one where the attributes are discriminated by their correlations with the categories, and one where the training examples are discriminated by their credits. In this research, the three KNN variants are modified into the table- based versions, as well as the initial KNN version. The goal of this research is to improve the keyword extraction performanceby modifying them so.
Category: Artificial Intelligence
[11] viXra:2601.0084 [pdf] submitted on 2026-01-22 00:24:56
Authors: Taeho Jo
Comments: 6 Pages. (Note by viXra Admin: Further repetition will not be accepted and please submit article written with AI assistance to ai.viXra.org)
In this research, we propose and apply the graph based AHC variants to the word clustering. The initial AHC version which clusters graphs was previous proposed as an approach to the word clustering. In this research, we mention the three AHC variants: one where the data clustering proceeds in the bottom-up direction with the similarity threshold, one where it allows any merge of more than two pairs, and one where clusters are merged based on the radius. In this research, we modify the three AHC variants into the graph- based versions, as well as the initial AHC version. As the goal of this research, we improve the clustering performance, by modifying them so.
Category: Artificial Intelligence
[10] viXra:2601.0081 [pdf] submitted on 2026-01-20 13:12:38
Authors: Cornel Badea
Comments: 7 Pages.
Recent advancements in Hierarchical Reasoning Models (HRM) have demonstrated strong capabilities in complex algorithmic and abstract reasoning tasks by mimicking multi-timescale cognitive processes. In this work, we extend this architecture to medical image captioning, introducing specific ImageHRM variants. Furthermore, we explore a radical simplification of this paradigm: the Tiny Recursive Model (TRM). Challenging the necessity of complex dual-loop biological hierarchies, TRM employs a single "tiny" network (7M parameters) that recurses deeply to achieve superior generalization. We introduce ImageTRM, which adapts this "Less is More" philosophy to vision-language tasks. Our experiments on ROCOv2 show that while the Triple-Loop FuseLIP ImageHRM achieves stateof- the-art results, the tiny ImageTRM with a Swin backbone surprisingly outperforms it, demonstrating that deep recursive reasoning with high-quality visual features can surpass larger, more complex architectures.
Category: Artificial Intelligence
[9] viXra:2601.0077 [pdf] submitted on 2026-01-20 22:38:23
Authors: Mahdi Rezapour
Comments: 8 Pages. (Note by viXra Admin: Please submit article written with AI assistance to ai.viXra.org)
This study examines user engagement with online video content using a multi-task learning approach. In this study, we combine viewing histories, basic user attributes, and content datasets from several public sources to predict both the proportion of a video watched and whether a user skips a video. The two tasks are learned jointly, using a shared representation with separate outputs for regression and classification. Several common multi-task architectures are evaluated and compared under the same experimental setup. Techniques like Multi-Gate Mixture-of-Experts (MMoE), and Progressive Layered Extraction (PLE), and cross stick network were employed. Results of this study on a held-out test set show that watch ratio can be predicted with reasonable accuracy, while skip prediction remains challenging and only marginally better than random guessing. Differences between model architectures are small, suggesting that data size and label definition might have a stronger influenceon performance than model choice. These findings highlight the difficulty of modeling discreteengagement outcomes from noisy behavioral data and point to the importance of careful labelconstruction in future work. Especially, this study highlights the challenges of prediction of skip prediction due to likely reason of subjectively setting the threshold.
Category: Artificial Intelligence
[8] viXra:2601.0076 [pdf] submitted on 2026-01-19 21:07:20
Authors: Thuy Thu Nguyen
Comments: 45 Pages. (Note by viXra Admin: Author name is required in the article; please submit article written with AI assistance to ai.viXra.org)
The reliability and performance of machine learning (ML) systems in production dependcritically on data engineering decisions made throughout the pipeline lifecycle. This compre-hensive technical review synthesizes ndings from 434 peer-reviewed publications spanning20182026 to quantify how upstream data collection, mid-stream preprocessing and featureengineering, and downstream versioning and monitoring decisions impact ML outcomes.We examine production systems across cybersecurity, healthcare, nance, and cloud-nativeplatforms, analyzing technical frameworks including Apache Kafka, Kubeow, MLow, andemerging feature stores. Our analysis reveals that data quality issues account for 6080% ofML system failures in production, with data engineering decisions inuencing model accu-racy by up to 40 percentage points. We identify critical decision points across the pipeline,quantify their impacts through empirical evidence, and provide actionable frameworks forpractitioners. Key ndings include: (1) streaming architectures reduce latency by 10100Öwhile maintaining accuracy within 25% of batch systems; (2) automated data validationcatches 7090% of quality issues before model training; (3) feature stores reduce feature engi-neering time by 5070% while improving consistency; and (4) comprehensive lineage trackingenables 35Ö faster debugging of production failures. This review establishes data-centricAI as essential for reliable ML systems and identies critical gaps in cost-benet analysis,cross-domain generalization, and standardized impact metrics.
Category: Artificial Intelligence
[7] viXra:2601.0053 [pdf] submitted on 2026-01-13 21:20:39
Authors: Taeho Jo
Comments: 6 Pages.
In this research, we propose the table based KNN variants, as the approach to the word categorization. The initial KNN version which receives a table as its input data was previously proposed as the tool of such task. In this research, we mention the three KNN variants: one where the selected nearest neighbors are discriminated by their similarities with a novice example, one where the attributes are discriminated by their correlations with the target outputs, and one where the training examples are discriminated by their credits. In this research, we modify the three KNN variants as well as the initial version of the KNN algorithm. As the goal of this research, we try to improve the classification performance bymodifying the KNN variants so.
Category: Artificial Intelligence
[6] viXra:2601.0048 [pdf] submitted on 2026-01-12 13:52:08
Authors: Sarah Makarem
Comments: 7 Pages.
PictoLens is a novel gaze-based interaction technique for exploring layered data visualizations through progressive disclosure. Thesystem uses real-time gaze data to implement a point-and-click interaction model. Through intuitive gestures such as ‘Gaze and Fixate’and ‘Gaze and Lean In,’ users can seamlessly interact with three representations of the data: an AI-generated pictograph, a scatter-plotvisualization, and an annotated scatter-plot visualization. This hands-free and voice-free interaction technique addresses key challengesof traditional data exploration, such as long dwell times and the Midas Touch problem. PictoLens uses intuitive metaphors fromeveryday gestures: the gaze serves as a pointer, moving the visualization lens. Fixating the gaze at a point on the pictograph unlocks afiner data representation, while leaning forward reveals the most granular, detailed visualization layer with annotations. We presentPictoLens’ design and implementation to demonstrate its potential as an immersive analytics tool and interaction technique.
Category: Artificial Intelligence
[5] viXra:2601.0045 [pdf] submitted on 2026-01-12 02:01:23
Authors: Ekaghni Mukherjee
Comments: 21 Pages. (Note by viXra Admin: Please submit article written with AI assistance to ai.viXra.org)
Large language models have demonstrated remarkable capabilities across diverse natural language tasks, yet controlling their output characteristics remains challenging. We present HelpSteer Transformer, an attribute-conditioned language model architecture designed for training on the HelpSteer dataset. The model incorporates modern architectural innovations including Rotary Position Embeddings (RoPE), SwiGLU activation functions, and RMSNorm, enabling fine-grained control over five response attributes: helpfulness, correctness, coherence, complexity, and verbosity.The model contains approximately 60 million parameters across eight transformer layers and is designed for efficient scaling while maintaining high-quality text generation. An explicit attribute conditioning mechanism integrates user preferences directly into the generation process, enabling dynamic control of outputs without requiring separate fine-tuning for different attribute combinations. Architectural analysis and preliminary experiments indicate competitive performance relative to larger baseline models, while maintaining lower computational cost. This work highlights the effectiveness of architectural conditioning for controllable and efficient language model design.
Category: Artificial Intelligence
[4] viXra:2601.0042 [pdf] submitted on 2026-01-12 00:39:19
Authors: Taeho Jo
Comments: 6 Pages.
In this research, we propose the three KNN variants which considers the feature similarity, as the approaches to the word categorization. The initial version of the KNN algorithm which does so was previously proposed as the tool of the task. We mention the three KNN variants: one which discriminates its selected nearest neighbors by their distances, another which does attributes by their correlations with the target outputs, and the other which does the training examples by their credits. The feature similarity is applied to the three KNN variants as well as the initial version. The classification performance is improved by applying the feature similarity to the KNN variants as the improved KNN versions.
Category: Artificial Intelligence
[3] viXra:2601.0038 [pdf] submitted on 2026-01-10 02:04:57
Authors: Friedrich Sösemann
Comments: 5 Pages. In German
From the minimal ontology of relational hierarchies, information, knowledge, and intelligence, as well as their measures, are derived. The following conclusions are drawn:1. Identical perception of subjects is not necessary for the truth of knowledge.2. Abstraction can lead to subjective randomness and isolated elements of knowledge.3. Knowledge networks are more effective, and therefore more intelligent, than sets of knowledge.
Category: Artificial Intelligence
[2] viXra:2601.0034 [pdf] replaced on 2026-01-30 05:42:29
Authors: Satish Gajawada
Comments: 12 Pages.
This article is a collection of five Excellent Artificial Intelligence (EAI) articles. First article defines new field Excellent Artificial Intelligence (EAI). Artificial Intelligence Researcher Algorithm version 1 (AIRAv1) is the version 1 of new algorithm designed in the first article. A new algorithm titled Teacher Brother Sister Father Mother Friend Artificial Intelligence Algorithm (TBSFMFAIA) is proposed in the Second article. Kindness Love Satisfaction Peace Excellence Money Happiness Respect Intelligence Health Artificial Intelligence Algorithm (KLSPEMHRIHAIA) is the novel and unique algorithm invented in the third article. A unique algorithm titled Prabhakar Gajawada Bhagyamma Gajawada Satish Gajawada Artificial Intelligence Algorithm (PGBGSGAIA) is proposed in the fourth article. Cricket Match Runs Algorithm (CMRA), Rice Bags Sales Algorithm (RBSA), English Language Sentence Algorithm (ELSA) and Object Swarm Optimization Algorithm (OSOA) are four novel Swarm Intelligence algorithms designed in the fifth article.
Category: Artificial Intelligence
[1] viXra:2601.0021 [pdf] submitted on 2026-01-05 20:34:11
Authors: Gabriel H. Eisenkraemer, Fernando G. Moraesy, Leonardo L. de Oliveira, Everton Carara
Comments: 75 Pages.
Abstract—We describe a lightweight RISC-V ISA extension for AES and SM4 block ciphers. Sixteen instructions (and a subkey load) is required to implement an AES round with the extension, instead of 80 without. An SM4 step (quarter-round) has 6.5 arithmetic instructions, a similar reduction. Perhaps even more importantly the ISA extension helps to eliminate slow, secret-dependent table lookups and to protect against cache timing side-channel attacks. Having only one S-box, the extension has a minimal hardware size and is well suited forultra-low power applications. AES and SM4 implementations using the ISA extension also have a much-reduced software footprint. The AES and SM4 instances can share the same datapaths but are independent in the sense that a chip designer can implement SM4 without AES and vice versa. Full AES and SM4 assembler listings, HDL source code for instruction’s combinatorial logic, and C code for emulation is provided tothe community under a permissive open source license. The implementation contains depth- and size-optimized joint AES and SM4 S-Box logic based on the Boyar-Peralta constructionwith a shared non-linear middle layer, demonstrating additional avenues for logic optimization. The instruction logic has beenexperimentally integrated into the single-cycle execution path of the "Pluto" RV32 core and has been tested on an FPGA system.
Category: Artificial Intelligence