A smart secure virtual reality immersive application for alzheimer’s and dementia patients

A smart secure virtual reality immersive application for alzheimer’s and dementia patients

Our proposed solution is a system integrates VR, voice recognition, and AI to provide smart and secure care for individuals with AD and dementia. This innovative system addresses cognitive engagement, loneliness reduction, health monitoring, and independence promotion, significantly enhancing the overall well-being of senior citizens.

The requirements for the proposed system were gathered from domain experts, including doctors and clinicians specializing in neurology, dementia, and AD. Additionally, they are able to test and use the system to provide their feedback. Their guidance helped highlight the importance of included features, such as conversational emotional support, bilingual capabilities, and culturally relevant tourism experiences through the VR headset. This collaboration with domain experts enhances the relevance of our proposed new system that differentiates it from general-purpose VR applications.

An overview of the proposed system architecture

The system design is presented in Fig. 3, which was based on a structured therapy model developed in collaboration with healthcare domain experts and clinicians, tailored to meet the needs of Alzheimer’s patients. This model incorporates four key features: (1) intelligent emotional interaction through an AI companion, (2) cognitive challenge scaling based on the user’s observed performance, (3) bilingual capabilities, particularly the ability to interact in Arabic, and (4) a familiar and low-stress VR environment. The model was iteratively refined through continuous feedback and consultation with healthcare professionals. Their insights guided both the content structure and the flow of real-time interactions.

Fig. 3
figure 3

System workflow architecture.

The system includes a modular, three-tier client–server architecture that integrates immersive VR, adaptive AI dialogue, voice recognition, and secure patient data management into a unified therapeutic platform. As illustrated in Fig. 3, the architecture comprises:

  • Presentation layer: Delivered via the Oculus Quest 2 headset, this layer manages all patient-facing interaction, including gesture input, voice commands, and real-time VR rendering.

  • Application layer: This layer handles the execution of cognitive therapy modules, real-time AI dialogue, virtual tourism experiences, and personalization logic based on patient profiles. It also governs task difficulty adjustment, conversational flow, and scene selection.

  • Data layer: This backend component manages secure storage and retrieval of user profiles, therapy progress, system configurations, and interaction history. It employs end-to-end AES encryption, OpenSSL-based key rotation, and role-based access control to ensure compliance with healthcare data protection standards (HIPAA/GDPR).

Furthermore, the proposed architecture includes a patient profile that collects patient preferences such as language, emotional triggers, cultural background, and other personal preferences. This profile will update the VR environment and the conversational scripts accordingly, enhancing patient engagement and making the care more cognitively tailored.

The technical architecture is more than VR voice interaction; it introduces a dynamic feedback therapy loop. By analyzing the multimodal user input such as voice, task performance, and engagement, the AI plays a vital role in adjusting the decision making in real-time. An integrated lightweight layer supported by the customization of the Unity scripts supports this process by analyzing patient states and aligning them with the appropriate personalized therapy modules or dialogue patterns.

The subsections that follow describe each system component in detail: AI companion (4.2), VR and interaction design (4.3), voice recognition system (4.4), personalization logic (4.5), the security and privacy framework (4.6) and the system features (4.7).

AI and virtual companion framework

The heart of the system is a real-time AI companion, implemented using ChatGPT 3.5 Turbo, and tailored for dementia care and emotional support. The AI is embedded within the VR environment as a 3D avatar that can converse in both English and Arabic. It offers patients cognitive stimulation, social interaction, and companionship factors known to alleviate feelings of isolation and anxiety in Alzheimer’s care.

To align with clinical needs, we customized the AI using a structured instruction framework. This framework includes:

  • Bilingual response logic (Arabic and English)

  • Emotion-sensitive fallback prompts

  • Reminiscence templates (personal memory prompts, cultural themes)

  • Safety handling for confusion or repetition

A key innovation is the incorporation of real-time interaction adaptation. The AI companion dynamically adjusts its conversational depth and pacing based on the user’s speech pattern, response latency, and detected emotional cues (e.g., tone, silence). These cues are processed by a rule-based logic layer that classifies whether to continue, redirect, or simplify dialogue, enhancing therapeutic relevance and reducing cognitive burden.

This framework was refined in consultation with clinicians to ensure it promotes cognitive engagement without overwhelming the user. The integration of ChatGPT-3.5 Turbo was based on its performance capabilities in producing coherent, emotionally sensitive, and bilingual-adaptable responses in real time, all tailored for Alzheimer’s and dementia care. As opposed to rule-based systems, ChatGPT-3.5 Turbo allows more flexibility, as well as real-time responses and adaptations. However, there are some limitations, such as occasionally generating inappropriate responses or failing to verify clinical facts. To address these issues, we have implemented conversational boundaries through prompt engineering, fallback safety responses, and continuous clinician feedback loops. These techniques aim to keep the proposed system safe and ethically responsible.

VR module and interaction design

The VR component of the system serves as the primary user interface, delivering immersive, personalized therapy experiences through the Oculus Quest 2 headset shown in Fig. 4. Environments were developed using Unity 3D—Unity Real-Time Development Platform (version 2022.3.10f1, available at for application logic and Blender for realistic 3D modeling of culturally relevant and therapeutic settings, including natural landscapes, religious monuments, and familiar landmarks.

Fig. 4
figure 4

Conceptual design of the VR headset.

The VR interface is structured around three key modules, as illustrated in Fig. 4, which outlines the core virtual reality algorithm and device interactions:

  • Tracking Module: Comprised of two subcomponents, head tracking, which uses built-in headset sensors to monitor the user’s gaze and orientation, and touch tracking, which relies on the handheld Oculus controllers to detect finger and hand movements. These inputs enable users to interact naturally with the environment by pointing, grabbing, or selecting objects.

  • Interaction Module: Translates controller inputs into virtual actions, such as navigating the environment, opening menus, or activating therapy tasks. This ensures that physical gestures are mapped intuitively to system responses.

  • Rendering Module: Responsible for generating the visual output seen in the headset. It includes a data manager (which collects user interaction data), visual object definitions (for all 3D assets), and a renderer, which combines data and objects to produce real-time scenes tailored to the user’s profile and inputs.

The user begins each session in a customizable home environment, which acts as the central hub for launching therapy activities (Fig. 6). From here, the user navigates a floating, bilingual UI menu available in both English and Arabic, to access various modules and destinations (Figs. 7 and 8). Through the Virtual Tourism Module, patients can explore immersive locations grouped into categories such as nature, historical landmarks, or religious spaces. Upon selection, the system may prompt a guided meditation sequence or play calming ambient sounds to promote relaxation (Fig. 9).

This layered with responsive interaction design is engineered to reduce user stress, support memory recall, and increase cognitive engagement by fostering active exploration and agency in a low-pressure virtual setting.

Voice recognition integration

The voice recognition subsystem is facilitated with Audio and Video (AV) functions that enable natural and intuitive interaction between the patient and the system using both English and Arabic. This functionality is essential for ensuring accessibility and comfort for users with different linguistic backgrounds. The system utilizes two speech recognition platforms:

The workflow for speech interaction is illustrated in Fig. 5. The Oculus Voice SDK captures English voice input via the built-in microphone on the Oculus VR headset. This SDK integrates advanced Automatic Speech Recognition (ASR) capabilities, converting spoken language into text while performing preprocessing such as noise filtering and speech quality enhancement. Once transcribed, the text undergoes Natural Language Understanding (NLU) processes including grammatical parsing and intent recognition to determine whether the input is a conversational message or a functional system command.

Fig. 5
figure 5

Speech recognition workflow.

Similarly, Microsoft Azure Speech Services handles Arabic speech input using cloud-based ASR and Text-to-Speech (TTS) tools. Spoken Arabic is transcribed into text and then synthesized back into natural-sounding Arabic audio, enabling seamless communication between the patient and the AI companion.

To interpret the user’s intent, whether issuing a command (e.g., “take me to the beach”) or engaging in general conversation, the transcribed input is sent to ChatGPT 3.5 Turbo. A classification routine identifies the input as either a directive or a question. If the input is a command, the system performs the corresponding VR action (e.g., navigating to a location); if it is conversational, the AI companion responds accordingly using its emotion-aware dialogue framework.

We implemented a custom instruction framework to guide the AI companion’s behavior, ensuring that all interactions remain clinically safe, emotionally supportive, and consistent with the best practices for dementia care. Additionally, we considered the cybersecurity issues by implementing secure encrypted communication between the AI module and patient interface to protect sensitive dialogue and maintain compliance with privacy standards such as HIPAA and GDPR.

Personalized therapy engine

A core innovation of the system is its ability to adapt therapy content dynamically to the cognitive needs for each patient, emotional state, and personal history. This personalization is achieved through a modular therapy engine that combines initial profile-based customization with real-time performance-based adaptation.

Upon system onboarding, each patient’s therapeutic profile is built in collaboration with caregivers and clinicians. This profile includes preferred language (Arabic or English), cultural background, emotional triggers, and cognitive history. These preferences are used to tailor both the visual content (e.g., scenes, UI elements) and verbal interaction style of the AI companion.

This real-time adaptation is supported by lightweight decision rules embedded in the Unity application layer, which interfaces directly with the AI companion and voice recognition systems. Together, they ensure a seamless transition between content types, while maintaining emotional coherence and therapeutic intent.

By continuously aligning the therapy experience to the user’s abilities and preferences, the personalized therapy engine increases patient satisfaction, reduces frustration, and enhances the likelihood of sustained engagement over time.

Security and data protection framework

Given the sensitivity of personal and cognitive health data involved in dementia therapy, the system integrates multiple layers of security and privacy protection. These measures ensure that user data remains confidential, is securely processed, and complies with clinical data governance standards.

  • Data encryption: All data transmitted between the client device (VR headset) and the backend server is protected using Advanced Encryption Standard (AES) with 256-bit keys. This is implemented via OpenSSL, a widely adopted cryptographic library, to secure API calls and real-time session data. Both persistent storage and in-memory processing of personal and medical data use encryption-at-rest and encryption-in-transit techniques. Additionally, key rotation strategies are applied periodically to refresh encryption keys and reduce long-term exposure risk.

  • Authentication and access control: The system employs Two-Factor Authentication (2FA) to verify the identity of clinicians, administrators, and caregivers accessing the backend. Furthermore, Role-Based Access Control (RBAC) ensures that users only have access to the data and system functionalities necessary for their specific roles—effectively minimizing insider threats and enforcing the principle of least privilege.

  • Regulatory compliance: To meet clinical standards for patient data protection, the system was designed to align with major data privacy regulations including the Health Insurance Portability and Accountability Act (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in the European Union. These guidelines shape how data is collected, stored, transmitted, and anonymized. Audit logging, consent tracking, and emergency override protocols are built into the system to support accountability and transparency during clinical use.

System features

The subsequent subsections provide an overview of the application’s core features and functionalities. Although the system employs existing technologies such as commercial AI models, Unity, and Firebase, the original contribution lies in the novel integration, personalization, and synchronization of these tools into a unified, secure, and a therapy system for Alzheimer’s and dementia patients care.

Functional requirements

The patients are the main actors and stakeholders of the system. The patient could include individuals with an existing Alzheimer’s diagnosis, senior citizens, or any individual who would like to undergo cognition testing. Linked to: RQ3, HP3.

The patient can interact with the UI within the virtual environment to visit relaxing virtual environments. These virtual Environments are categorized into Nature and Relaxing environments, religious monuments, and Countries. Linked to: RQ1, HP1 & RQ3, HP3.

The patient can navigate the home environment and move around through the use of the Oculus controllers, allowing them to explore and interact with the components of the home environment. Linked to: RQ2, HP2.

The patient can interact with the AI companion to ask questions or have conversations in either Arabic or English. Linked to: RQ1, HP1.

The patient can complete cognitive-enhancing therapy tasks in the virtual environment. These include memory games that target different cognitive functions and aim at improving them gradually. Linked to: RQ1, HP1 & RQ2, HP2 & RQ3, HP3.

The AI companion can be customized to visually and vocally replicate a loved one, including deceased family members, using pre-recorded voice data and facial reconstruction. This is designed to increase emotional comfort, reduce feelings of loneliness, and simulate familiar, comforting social interaction for the patient. This unique integration of AI and voice cloning offers a deeply personalized experience, bridging memory and emotion for Alzheimer’s patients. It extends the AI companion beyond a generic assistant, transforming it into a therapeutic memory anchor. Linked to: RQ2, HP2.

One key technical novelty of the proposed system is its ability to support an adaptive therapy engine which dynamically provides personalized exercises through the integration of rule-based logic along with machine learning algorithms. By tracking different metrics such as speech delays, reaction time, and performance scores, the AI adjusts the different interaction activities and sustains user engagement.

link

Leave a Reply

Your email address will not be published. Required fields are marked *