Final Outcomes: 3D Scenario and Results
Ultimately this project has been conducted with the aim of ascertaining whether there is any discernible difference in user experience when using either diegetic or non-diegetic forms of visual signposting in first-person video games. The answer to that question will be explored further down on this page using the data acquired as part of the main testing, whilst a breakdown of the 3D scenario and its effectiveness during main testing will be covered first.
3D Video Game Scenario
The 3D first-person video game scenario developed in Unreal Engine (Epic Games 2022a) was the all important testing device needed for this project. The game at its core is relatively simple; designed as a basic puzzle game that enabled me to naturally implement diegetic and non-diegetic forms of visual signposting into play. All assets in the scene are developed by myself utilising Autodesk Maya (Autodesk 2021), Substance 3D Painter (Adobe 2023), Photoshop (Adobe 2020) and Quixel Mixer (Epic Games 2022b), with the exception of the hand asset used in the diegetic level (bariacg 2018). This asset was sourced online and my only input with it was to lengthen it slightly in Autodesk Maya, re-UV-map after the lengthening, and finally create a very basic textures to maintain a consistent style with the rest of the scenario.
3D Scenario Breakdown
This section aims to breakdown the 3D scenario and explore the features and assets present in the various stages/ levels that appear within it.
Nexus
The nexus is the first area that players load into when launching the 3D scenario. The nexus was utilised to complete three different tasks related to preparing participants for the rest of the test that lay ahead. The three areas of focus included:
-
Outline what to expect during the scenario
-
Provide interactive sections for players to acclimate themselves with the scenario
-
Provide level timings for participants to record their times for usage in the questionnaire
Diegetic Signposting Level
The diegetic signposting level is one of the two levels to feature in this scenario. The order in which participants experience these levels is random. The starting area of the diegetic level features a simple puzzle involving diegetic symbol and indication association required for completion. Players are expected to place items indicated by diegetic symbols onto specific pressure pads to open the door out of the room. Following this, participants can now exit through the open door into a corridor with two available paths. The correct path is indicated by a diegetic symbol in the form of a signpost. Following the correct path will lead players to the complex diegetic puzzle. This puzzle is similar in concept to the previous one as it includes symbol and indication association, however the method of placing items in the correct places is different. Players must move blocks along a surface to their intended position, players' ability to interact with these blocks is indicated by diegetic handles on the blocks that raise the players hand when hovered over. A full playthrough of the level can be seen in the video to the left and a further breakdown of the mechanics and their development can be seen in previous sections of the site.
Non-Diegetic Signposting Level
The non-diegetic signposting level is the other of the two levels to feature in this scenario. The starting area of the non-diegetic level features a simple puzzle involving levers that have associated non-diegetic symbols. The states of these symbols changes with the state of the lever handle, and the symbols have to match the non-diegetic symbols associated with a notice board in the room to open the door. This boards' importance is demonstrated by non-diegetic indications in the form of floating pixels. Following opening the door participants are greeted by a post. When looked at directly, this post reveals a non-diegetic symbol that points out the direction of the next puzzle. By following this non-diegetic signpost, participants are then greeted by the complex puzzle. The complex puzzle uses the same lever system as the simple puzzle, however some of the lever handles are missing and require specific handles to be placed into their base. Incorrect handles are denoted by a non-diegetic symbol that appears beneath the base if the wrong handle is held. Participants are required to open a specific door through the same means as the first level to access the correct lever handle for the lever base connected to the exit door. A full playthrough of the level can be seen in the video to the left and a further breakdown of the mechanics and their development can be seen in previous sections of the site.
Reflection on the Effectiveness of the 3D Scenario
Understanding the effectiveness of the 3D scenario with regards to its functionality and its ability to acquire usable data is key for me to learn from this experience and understand the projects' potential limitations.
​
I believe the usage of the original 3D scenario during main testing went relatively well. The 3D scenario ran smoothly on a variety of machines both at home and elsewhere. The scenario contained a good deal of assets and polished mechanics prior to its distribution, and I had managed to keep the file size under 400MB when compressed. These things were important for me as I needed to ensure the test was accessible no matter the storage or hardware limitation that may be present on other peoples machines. Technically the scenario seems to have performed well too - there were only a few reported technical issues with the scenario from participants. Some participants noted minor glitches with equippable items, and only one participant reported a game breaking error, again to do with equippable items. With the varied amount of systems accessing this test, and considering their associated hardware differences and limitations I feel that this number of technical issues is relatively light, especially as the severity of the issues encountered was minor.
​
From a data collection perspective I again believe the test functioned well. However, there are some key areas of the scenario I believe may have had some impact on the data gathered. Positives wise with data collection as a consideration, the testing levels were very similar in concept whilst still remaining distinct enough, which meant I could ensure participants' data was not affected by repeat actions or puzzles with differing levels of signposting. There were some issues related to similarity too though. These issues stemmed from the different time taken to complete the levels. In the data covered below there is a clear disparity in the time taken to complete each level, and I believe this could have possibly affected data. Whilst the effect is likely negligible and indicated in the coverage of each user experience component, I felt it was nevertheless worth mentioning.
​
Ultimately I feel the scenario performed well under the circumstances in which it was utilised. The methods of distribution varied greatly and the vast majority of the issues encountered with it were minor. I cannot think of another way in which I would test for a phenomena such as this other than utilising a pre-existing game or testing scenario, which would not have allowed me to develop my knowledge and skills in the new and emerging industry software I have utilised throughout this project.
Main Test Results
Between the 15th of May and the 15th of June I conducted the main testing through a combination of remote distribution via Discord (Discord Inc 2022) and in person distribution via data collection days hosted at the Confetti Institute of Creative Technologies. The main testing has left me with a wealth of good data that I have been able to utilise in order to form conclusive opinions related to the research question. Whilst the amount of responses is decent it has not quite reached my previous goal of 20-30. However, the data gathered will still be more than useful in aiding mine and the industries' understanding of diegetic and non-diegetic signposting's comparative effects on user experience in first-person games. As was stated in previous sections, the main testing survey has incorporated the in-game version of the Game Experience Questionnaire (GEQ) developed by Ijsselsteijn, De Kort and Poels (2013), alongside additional sections covering participant demographic information and personal opinions. Reviewed first will be the demographic make up of the participants, followed by the pre-GEQ information, and finally review of the data gathered from GEQ with regards to the 7 components of user experience measured.
​
The files employed during main testing can be found here.
Main Test Results - Basic Participant Information
Participant Gender, Age, and Game Experience




Participant gender was biased very similarly to the pilot test. I believe this is most likely down to the excess of males present in the methods of distribution utilised. The make up of people on the data collection day was primarily male, and I would assume given the amount of males that responded, that the same is true for the Discord (Discord Inc 2022) servers the test was distributed throughout. One participant identified as something other than a man or woman, and another chose not to list their gender, lending credence to my decision to make listing protected characteristics as unnecessary for completion of the survey. Participant age was also biased towards younger people, again likely down to to age of participants being commonly low for the data collection day and Discord servers. Experience too suffered from this bias, with only 2 of the 17 participants stating they were not experienced with video games. Despite this, hours in game per week vary greatly between participants, and as such this will likely be the more interesting metric for use in comparison to other data over participant experience.
Pre-GEQ information




Similarly to the pilot test, the level participants interacted with first was split almost exactly down the middle. This was to be expected as it is a 50/50 on what level is played first as to avoid all players experiencing one level first and potentially skewing results. Also similar to the pilot test is the time taken to complete the levels. Whilst I thought I may have fixed this by increasing the complexity of the complex puzzle in the non-diegetic level, it seems the additions still allowed participants to complete the level quicker. Despite potentially affecting their ability to access levels of immersion as noted by Brown and Cairns (2004), I feel this will not be an issue in the data presented below; as the range between the upper and lower quartile are very similar in relation to either level despite the obvious increase in time taken to complete with level A. There is also a clear outlier in the box and whisker plot for level A, this is participant 5. This participant was one of the two who reported not being experienced with video games, alongside reporting low hours played in the past week, which is most likely the cause of this outlier.
Main Test Results - GEQ Responses




























Covered now will be the average participant responses for each component of the in-game GEQ (Game Experience Questionnaire) utilised in the Microsoft Forms (Microsoft Corporation 2016) survey. Each component will be accompanied by a gallery with associated tables. A brief summary of my thoughts and any correlations will be given.
​
For reference, Level A is the level primarily incorporating diegetic signposting, and Level B is the level primarily incorporating non-diegetic signposting. Data can be viewed independently via the button to the left.
Competency Component Ratings
The average competency component ratings show a clear bias towards higher levels of competency in Level B (Non-diegetic level). The difference between the two is far more pronounced than it was with the pilot test, as it was with the time correlation too. Time did not seem to incur much of a trend in the pilot test, whereas here time seemed to positively affect participant competency in Level A, and negatively affecting competency ratings in level B. This increase in competency with time taken for Level A is most likely down to players having more time to get used to the controls and mechanics found in the diegetic level. Participants preferred level matched perfectly with the level they reported higher competency in if they rated one higher than the other.
Sensory and Imaginative Immersion Component Ratings
The average sensory and imaginative immersion ratings for both levels were surprisingly exactly the same, despite clear biases toward certain levels present in nearly every participant. Participants 3, 7, 8, and 12 showed very polarised averages for each of the levels, and all participants mentioned but participant 3 reported the levels with the higher average to be their preferred level. Time did not seem to incur much of a trend in Level A, and only very minorly caused a decrease in average ratings in Level B. I personally would have imagined more polarising views here with regards to immersion. Experience does seem to have played a minor role here with inexperienced participants, as participant 5 and 17 who reported not being experienced both reported preferring the clearer non-diegetic level. This is similar to a phenomena noted by Iacovides and others (2015), in which players with higher expertise with games are able to access levels of immersion easier with diegetic game elements present in comparison to inexperienced participants.
Flow Component Ratings
Participant average flow ratings fell pretty much in line with what I expected. The non-diegetic level had a slightly higher average flow which makes sense considering user preferences and participants ability to complete the level quicker. This is further evidenced by the time correlation with flow, where time taken seems to decrease participant flow scores the longer participants took. This strangely is not the case with Level A though, as a higher level time can be seen trending with a higher flow rating. This is likely similart to the phenomena noted with sensory and imaginative immersion, in which participants who have more time to understand their mechanics and immersion feel a greater sense of flow upon achieving greater understanding.
Positive Affect Ratings
Average positive affect ratings were fairly biased in favour of Level B. Some participants reported balanced positive affect scores, however almost all still reported B to be their preferred level. Time seemed to correlate with an increase in positive affect ratings which was not something I had expected. Whilst the bias for Level B and the positive correlation in relation to time taken was observed in the pilot test, the effect seems far more exaggerated in both cases with regards to the main testing. Participants 1 and 7 both showed far higher scores for each respective level, and both ultimately chose the associated level to be there preference, this choice seems to be linked to reward and frustration respectively. Comments can be seen in the raw data files or further on this page beyond the GEQ components coverage.
Negative Affect Ratings
Negative affect averages favour Level A according to this chart, meaning participants felt more negative on average when playing Level A over Level B. The contrast in scores does seem to be exacerbated by participant 7's ratings, although even with this omitted Level A would still be the higher scoring of the two levels. Time played in both levels seems to play little effect with regards to participants average negative affect scores, however a negative trend is somewhat visible with Level A. I am not sure why this is, I would have assumed with more time played players' negative affect would have grown, however the opposite seems to be true for both levels. This could possibly be down to players garnering a better understanding of the levels as they play for longer. However, there is no concrete evidence of this being the case in participant qualitative responses or other data.
Challenge Rating
Almost every participant has recorded a higher mean challenge ratings for Level A over Level B, or reported matching scores. The only participants to rate Level B as more challenging than A were participants 3 and 10, both of whom seemed to prefer the level due to its responsiveness and completion time respectively. Time seems to correlate with higher average challenge ratings, this makes sense as people who felt the test was challenging likely took longer to complete it. This effect seems to be more exaggerated for Level A over Level B.
Tension/ Annoyance Rating
Tension/ Annoyance average ratings were very low for both levels, with a marked increase in Level A. This score does seem to have been exacerbated by participants 7 and 14 for both levels. Participant 7 stated "The colouring and shadows made it hard to tell the colours and shapes of the objects in level A, making it extremely frustrating to solve the objective.". This was an issue with the machine I utilised on the data collection day, I failed to recognise there was an issue with the monitor settings on the day. Whilst this does not seem to have had too much of an effect with other users, it clearly has for this user. On the flip side, participant 14 stated "(talking about Level A judging by their listed level preference)More immersion and thought". Participant 14 reported being experience with games and has a good amount of hours played a week, which likely harkens back to idea of the role expertise plays in accessing states of immersion in diegetic scenarios (Iacovides. et al 2015). Time seemed to correlate with tension scores strangely, however I believe this is most likely down to outliers that reported high scores with lower times, both of whom can be seen in each time correlation graph.
Main Test Results - Post GEQ Information



Level preference amongst participants demonstrated a bias toward a preference for Level B. This is somewhat reflective of what was found in the pilot test, however the bias was slightly less prevalent. Interestingly, expertise does not seem to have played much of a role in these level preferences. Users with both high and low hours per week are mixed between which level they prefer, and there does not seem to be any identifiable trend. This was surprising to me as those who preferred Level A seem to prefer its challenge, whereas those who preferred Level B seemed to prefer the clarity and responsiveness of the non-diegetic symbols and mechanics.
Main Test Results - Limitations, Conclusion, and Closing Thoughts
Main Test Results - Limitations
Looking back on the project there are two key limitations that come to mind that may have affected the integrity of the data attained.
The first key limitation in my mind is the small differences with the 3D video game scenarios levels. Both the simple and complex puzzles in each level utilise symbol and indication signposting to essentially the same degree in each of their respective forms or representation. However, there are key differences in the methods of interaction for the puzzles used, and I feel this has impacted some users. The interactions used for the puzzles in the diegetic level are multi faceted, as players have to both pickup, drop, or push objects to achieve symbol and indication association. This is not the case in the non-diegetic level, as players only have to interact with levers once per lever to achieve the symbol and indication association. Some levers require the handle being inserted first, however this is only the case for two of the nine total present in the level. These differing levels of effort required to interact, whilst not directly increasing complexity, did incur a higher time to complete in the diegetic level. Whilst I do not believe the effect was major, this issue is indicated by the time correlation graphs in some components, alongside being mentioned by some participants in the comments related to their level preference.
​
The secondary core limitation to main testing in my mind is the lack of desired participant numbers. Whilst 17 participants has provided me with a good deal of information to review and deliberate on, I cannot help but feel that the data collected would have been more representative of the population as a whole if I had managed to distribute the test better. Women have been noted to make up just under 50% of gamers in both Europe and America, and a sizeable portion of gamers are over the age of 25 (Jovanovic 2023). With this in mind, the issues related to my distribution become more clear, as I failed to garner many responses from the groups mentioned, and by extension failed to conduct research representative of the target audience. Having a larger sample size also means the effects of biases or misunderstandings related to one participant have less of an affect on overall scores, providing a sort of margin of error (Budiu and Moran 2021).
​
Whilst the limitations covered here might have affected my findings somewhat, I do not feel the effects of these problems had a major impact on any of my findings. However, for the sake of other tests, impartiality and transparency, it feels necessary to outline area in which the testing could have been improved/ conducted better.
Main Test Results - Conclusion
The results attained through conducting this research project are extremely interesting to me. The data gathered both supports things which I have observed through secondary and primary research in some places, and goes against other findings in different areas.
​
Many of the GEQ component averages were very similar to what was observed during the pilot test. Many of the scores were lower, almost certainly owing to the reduction in statements relating to each component. However, the biases towards components related to certain levels mostly stayed the same. There were exceptions to this - sensory and imaginative immersion scores were balanced this time, instead of favouring the diegetic level as they did in the pilot test. Flow now favours the non-diegetic level which I had expected prior but was not represented by the pilot test. There are a few instances of things like this, however overall there seems to be little difference between the levels. Whilst the non-diegetic level is favoured in many of the rating averages, the differences are minute, without one even passing a single point above or below the other. Some findings from secondary research can be linked back to the findings here too. Expertise can be seen to play a role in some of these ratings in both main and pilot testing, especially to do with elements like immersion and flow; supporting findings by Iacovides and others (2015) and Marre, Caroux and Skadavong (2021). However, the difference in scores and feelings between immersion and flow seem to run counter to statements by Michailidis, Balaguer-Ballester and He (2018) where they make the argument that flow and immersion are one and the same, and you would assume score similarly to one another because of this. However, this was not the case, similarly to the pilot test.
​
Ultimately, through the collection and review of these results, I have come to the conclusion that there is little difference in the comparative effects of diegetic and non-diegetic forms of visual signposting on user experience in first-person games. Whilst there is a very minor bias present toward the non-diegetic signposting level in some component averages, and there are likely some elements present in either level that may have minorly affected participant scores, I am not confident that one form of visual signposting is objectively better for user experience in comparison to the other. This result was somewhat expected, especially when considering the results of other testing conducted by the likes of Iacovides and others (2015), and Marre, Caroux and Sakdavong (2021). However, it still presents the fact that there is not a definitive answer for a question such as this yet in relation to games, and further demonstrates that current and future testing is necessary to ensure game design is continually approached with user experience in mind.
Main Test Results - Closing Thoughts
As I conclude this final section covering my MSc research project I wanted to provide some closing thoughts I have regarding the project and events related to it. Whilst there were definitely issues and limitations I encountered throughout this project, it has nevertheless been a place for me to grow and develop both academically and professionally. The project has given me a great opportunity to utilise different new versions of industry standard software I am likely to encounter when I begin working in the industry. Alongside this, the project has also improved my soft skills immensely, skills such as academic writing, file management, in-person research, working with Excel (Microsoft Corporation 2023), etc. Whilst there were certainly issues, especially related to things like distribution testing and some issues related to mechanic interaction, this research project has still undoubtedly been a positive experience for me and the industry considering my personal developments and the findings of the research I have conducted.