Forecasting Time Series With Embedded Rational Functions

Abstract

Conventional neural network architectures rely on universal, generic activation functions, often overlooking the potential benefits of embedding domain-specific knowledge directly into neuron behavior. This paper introduces a novel architectural paradigm where standard activations are replaced by a flexible framework of rational functions. Our design features a unique two-stage process, wherein a learnable rational pre-conditioner dynamically prepares the input for a specialized secondary activation. These secondary activations can be trained or, crucially, predefined to approximate complex, fundamental dynamics drawn from disparate scientific fields. We validate this approach on a highly non-stationary financial time-series, demonstrating superior stability and predictive accuracy. Results show that embedding specific functional priors not only enhances performance but also opens a pathway toward designing heterogeneous networks with functionally specialized neurons, representing a new step in augmenting model cognition with structured, expert knowledge.

View Related Publications

GitHub Repo : https://github.com/Apoth3osis-ai/embedded_rational_function

Research Gate: https://www.researchgate.net/publication/392716443_Augmenting_Neural_Cognition_Forecasting_with_Embedded_Rational_Functions

1. Introduction

Deep learning has established itself as a cornerstone of modern artificial intelligence, with neural networks demonstrating remarkable capabilities as universal function approximators. However, this universality often comes at the cost of efficiency and insight. Standard architectures, equipped with generic activation functions like the Rectified Linear Unit (ReLU) or its variants, learn from a "blank slate." They are tasked with discovering underlying principles from data alone, ignoring the vast repository of human-derived mathematical and scientific knowledge about the system being modeled.

This work challenges that paradigm. We posit that for many complex systems, particularly those in finance, engineering, and the natural sciences, performance can be dramatically enhanced by embedding established domain knowledge directly into the network's cognitive machinery. Instead of relying on a neuron to learn a fundamental physical law or financial principle from scratch, we can provide it with a "functional prior"—a mathematical starting point that encapsulates that knowledge.

To achieve this, we introduce an architecture centered on a novel Mixed Rational Dense layer. This layer replaces standard activations with highly flexible rational functions (ratios of polynomials). Its key innovation is a dual-pathway system that allows neurons to be either fully trainable, discovering their own behavior, or predefined to approximate specific, complex functions. We demonstrate that by pre-defining neurons to emulate dynamics from fields as diverse as quantitative finance and statistical physics, we can construct models with superior stability and predictive power.

Our contributions are threefold:

We introduce a novel and flexible neural layer architecture based on a two-stage rational function transformation.

We demonstrate through rigorous experimentation on a challenging, high-frequency financial time-series that embedding domain-specific functional priors significantly improves model performance over tabula rasa approaches.

We propose a new direction for neural network design, moving from homogeneous structures toward engineered, heterogeneous networks of functionally specialized neurons.

This paper will first review related work in activation functions, then detail the proposed methodology and architecture. We will subsequently describe our experimental setup, present the results, and conclude with a discussion of the broader implications, applications, and future directions of this research.

2. Related Work

The choice of activation function is a critical element of neural network design. The popularization of the Rectified Linear Unit (ReLU) and its variants (e.g., Leaky ReLU, ELU) was a pivotal step, mitigating the vanishing gradient problem that plagued earlier sigmoidal and hyperbolic tangent functions. More recent innovations, such as Swish and GELU, have offered smoother, non-monotonic alternatives that can lead to improved performance by virtue of their more complex surface.

A parallel stream of research has explored learnable activation functions. Parametric ReLU (PReLU), for instance, allows the slope of the negative part to be learned during training. This work extends that concept significantly, moving from learning a single parameter to learning the entire functional form of the activation via the coefficients of a rational function.

Rational functions themselves are well-established in approximation theory for their ability to model complex functions, including those with singularities or sharp transitions, more efficiently than polynomials alone. Their use as activation functions in deep learning, however, has been limited, largely due to concerns about training stability. Our work addresses these stability concerns directly through architectural design and presents a framework for their systematic application. Finally, the concept of a "prior" is central to Bayesian inference. Our use of "functional priors" is an analogous concept translated into a deep learning framework, constraining the hypothesis space of the model not with a probability distribution, but with a specific functional form.

3. Methodology: The Embedded Rational Function Architecture

Our approach is centered on replacing the static, predefined activation function of a standard neuron with a dynamic, expressive rational function. This provides the foundation for embedding complex mathematical behaviors.

3.1. Rational Functions as Activation Units

A rational function R(x) is a ratio of two polynomials, P(x) and Q(x):

R(x)=Q(x)P(x)=∑j=0mbjxj∑i=0naixi

For our work, we use low-degree rational functions (typically n=m=2) to balance expressiveness with computational tractability. This form is significantly more flexible than standard activations, capable of approximating a wider variety of shapes, including non-monotonic curves and functions with asymptotes, which may be crucial for modeling real-world phenomena.

3.2. The MixedRationalDense Layer

The core of our architecture is the MixedRationalDense layer. It processes its inputs through a unique two-stage mechanism designed to decouple feature learning from the application of the final activation behavior.

Stage 1: Learnable Pre-conditioner: For each neuron, the weighted sum of inputs from the previous layer is not computed directly. Instead, each individual input-neuron connection is passed through a dedicated, trainable rational function. The outputs of these functions are then summed to produce an intermediate value, z. This stage acts as a highly adaptive filter, allowing the network to learn the optimal way to combine and transform input features before the primary activation is applied.

Stage 2: Specialized Activation: The intermediate value z is then passed through a secondary rational function. This function can be one of two types:

This hybrid design permits the construction of heterogeneous layers where some neurons are specialized for known tasks while others maintain the flexibility to adapt to unknown patterns in the data.

3.3. Functional Priors for Neuron Specialization

The ability to pre-define neuron behavior is the cornerstone of our methodology. By selecting appropriate functional priors, we can infuse the network with domain knowledge. In our experiments, we drew from several fields to construct our library of priors:

Standard Activation Approximations: Rational functions that mimic the behavior of common activations like tanh and softplus. These serve to validate that our framework can replicate existing successful architectures.

Quantitative Finance Models: Functions that approximate phenomena observed in financial markets. This includes rational approximations of volatility decay models (e.g., from GARCH processes) or term structures used in interest rate modeling.

Physics and Engineering Principles: Functions describing fundamental physical processes. Examples include approximations of the Lorentz factor from special relativity, which models behaviors at extreme velocities, and statistical distributions like Fermi-Dirac, which govern particle states in quantum systems. The inclusion of these priors tests the hypothesis that complex systems, even financial ones, may exhibit dynamics analogous to those in physics.

3.4. Numerical Stability

Training a network of rational functions requires careful management of numerical stability. To prevent division by zero and exploding gradients, we employ two primary safeguards:

The coefficients of all predefined functions are pre-scaled to a normalized range.

During the forward pass, a small epsilon (10−6) is added to the denominator of any rational function evaluation, ensuring it never equals zero.

4. Experimental Setup

To validate our approach, we designed a series of experiments on a challenging, real-world forecasting task.

4.1. Dataset

We utilized a high-frequency financial time-series dataset: one-minute resolution open-high-low-close (OHLC) price data for the EUR/USD currency pair. This dataset is characterized by high noise, non-stationarity, and complex, multi-scale dynamics. It was intentionally chosen to stress-test the stability of our architecture and to evaluate whether embedding functional priors can provide a regularizing effect in a volatile data environment. The task was to predict the high price of the pair 10 minutes into the future.

4.2. Feature Engineering

Standard time-series features were engineered from the data. For each timestep, the model was provided with the OHLC values from the previous 20 minutes (20 lag steps), as well as derived features such as intraday volatility and price momentum.

4.3. Experimental Scenarios

We conducted a series of experiments to compare different network configurations. Each model shared the same core hyperparameters (4 hidden layers of 160 neurons each) but differed in the assignment of activation functions:

trainable_only: A baseline model where all neurons in all layers were trainable rational functions, containing no predefined knowledge.

random_mix: A baseline where 50% of neurons were randomly assigned a functional prior from our library, and 50% were trainable.

Homogeneous Priors: A set of models where all neurons in the network were assigned the same predefined functional prior (e.g., one model with only garch_exp neurons, another with only lorentz_factor neurons, etc.).

4.4. Training and Evaluation

Models were trained using the Adam optimizer with a learning rate of 10−4 and a Mean Squared Error (MSE) loss function. We employed early stopping with a patience of 10 epochs to prevent overfitting. Final model performance was evaluated on a held-out test set using Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).

5. Results and Discussion

The results of our experiments strongly support the hypothesis that embedding functional priors enhances model performance.

5.1. Performance of Functional Priors

We observed significant variation in performance across the models with different homogeneous priors. Models equipped with certain priors consistently and substantially outperformed the baselines. An illustrative subset of results is shown in Table 1

Table 1: Illustrative comparison of RMSE for different scenarios on the test set. Models with embedded priors (e.g., softplus_like) demonstrate an order-of-magnitude improvement over the trainable_only baseline.

Notably, the best-performing models (softpluslike, mishlike) utilized priors that were smooth and non-negative, which may be particularly well-suited for predicting a positive-valued quantity like price. The dramatic underperformance of the trainableonly baseline suggests that in a high-noise environment, unconstrained rational functions struggle to converge to a stable and effective solution without the guidance of a prior.

5.2. Implications of Neuron Specialization

The success of the homogeneous prior models indicates that a network's global behavior can be effectively guided by specializing the function of its constituent parts. This stands in contrast to conventional networks where all neurons are functionally identical. By selecting a prior that aligns with the general characteristics of the problem domain, we provide a powerful inductive bias that accelerates and improves the learning process. The fact that priors from disparate domains (both finance and physics) were found to be effective suggests that the utility of this approach lies in matching the mathematical properties of a function to the problem, rather than a strict adherence to its original domain.

6. Applications and Future Work

The findings of this research open several promising avenues for both practical application and future investigation.

6.1. Potential Applications

Quantitative Finance: The most direct application. Designing specialized models for algorithmic trading, risk management, and derivatives pricing by embedding functions from stochastic calculus (e.g., Black-Scholes or Heston models) directly into the network.

Scientific and Engineering Simulation: Creating "physics-informed" neural networks that can learn to simulate complex systems (e.g., fluid dynamics, material stress) more efficiently by embedding known physical laws or conservation principles as priors.

Control Systems: Developing more robust controllers for robotic or industrial systems by embedding priors that respect known physical limits and system dynamics.

Medical Signal Processing: Designing networks for analyzing EEG or EKG signals with neurons specialized to detect known waveform morphologies or frequencies.

6.2. Future Work

This research lays the groundwork for several exciting future directions:

Automated Prior Selection: While we selected priors manually, future work could focus on developing meta-learning algorithms that automatically select or generate the optimal functional prior for each layer, or even each neuron, based on the data and task.

Designing Heterogeneous Networks: Moving beyond the homogeneous models tested here to intentionally design networks with a diverse ecosystem of specialized neurons. For example, a layer might contain some neurons for trend-following and others for mean-reversion, each with a corresponding prior.

Exploring the Library of Functions: Systematically expanding and testing the library of functional priors, drawing from a wider range of mathematical and scientific disciplines to create a comprehensive toolkit for AI architects.

7. Conclusion

This paper introduced a novel architectural paradigm for deep learning that moves beyond generic components and toward engineered, functionally specialized systems. By replacing standard activation functions with a flexible framework of trainable and predefined rational functions, we have demonstrated a powerful method for embedding domain-specific knowledge directly into a neural network. Our experiments on a challenging financial forecasting task show that this approach leads to dramatic improvements in model performance and stability.

The ability to craft networks with purpose-built neurons represents a fundamental shift in how we can approach complex problems. It is a step toward a future where AI systems are not just trained, but are meticulously designed, merging the pattern-recognition strengths of machine learning with the structured, principled knowledge of human science. This symbiosis is the core of our mission at Apoth3osis, and we believe this methodology will be instrumental in building the next generation of truly intelligent systems.

References

1 Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366.1

2 Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference2 on Machine Learning (ICML-10).

3 Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.

4 Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

5 Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.3

Augmenting Neural Cognition - Forecasting with Embedded Rational Functions