Psikhologicheskie Issledovaniya • ISSN 2075-7999
peer-reviewed • open access journal


Degtyarenko I.A., Leonova A.B. Experimental elaboration of the complex approach to websites usability evaluation [Full text]

Full text in Russian: Дегтяренко И.А., Леонова А.Б. Экспериментальная разработка комплексного подхода к оценке юзабилити интернет-сайтов
Lomonosov Moscow State University, Moscow, Russia

About authors
Suggested citation

The set of principles and methodological instruments for complex evaluation of websites usability is substantiated. Different measures of user satisfaction and subjective comfort combined with objective indicators of task success, efficiency of the cognitive load distribution (evaluated based on eye movement data), and the degree of vegetative emotional tension were considered. The study demonstrated the applicability of this approach for differentiated assessment of usability in web-site engineering.

Keywords: usability, website, user satisfaction, work efficiency, subjective comfort, cognitive load, eye movements, vegetative emotional tension


[English translation is provided by the author of the article.]

Usage of the World Wide Web resources play an increasingly important role in the lives of modern people. It covers virtually all of their main activities including work, education, information search, communication, recreation and entertainment. In scientific and applied research, the understanding of software product's ergonomic qualities as a key factor defining its commercial success, widespread acceptance on the market, and user satisfaction, grew substantially starting from the end of 1980s. It was also associated with the efficiency of solving various tasks in the Internet environment. Modern software quality models use the special term “usability” (literally meaning “fitness for use”) to define the ergonomic qualities of the product [Kostin, 2011].

Classic definition of usability was introduced in [ISO 9241-11:1998] standard: “usability is the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use[1].” This way of putting the problem highlights the need for development of complex methods and tools for assessment of software quality by the criteria of ease and effectiveness which are based on the usage of behavioral, psychological and psychophysiological indicators.

Procedures of the websites’ usability assessment and drawbacks of the existing methods

Traditionally, usability evaluation is based on such techniques as expert evaluation and usability testing [Nielsen, 1993]. The latter is a special experimental technique. During usability testing members of the product’s target audience use it to perform a number of tasks critical for the success of their work and other activities. In practice, the procedure mainly comes to an external observation in order to identify key problems faced by the participants. The following quantitative measures, if any, are usually used: (1) success rate (e.g. on the following scale: “performed the task without hints” – “performed the task with a hint” – “failure”); (2) time on task. An example of the typical usability testing procedure used in the commercial setting may be found in [Golovach, 2005].

From the perspective of usability definition presented above, the first mentioned group of metrics is the measure of effectiveness and the second is the measure of efficiency of use. Particular characteristics of the user’s motor activity such as the number of “mouse” clicks, the number of transitions between pages etc. can also be used to evaluate efficiency. However, it should not be forgotten that besides the measures of task performance, a fundamentally important component of the activity efficiency is its "internal cost" defined by the cognitive, emotional and physiological efforts applied by the subject [Leonova, 2007]. However, these measures are almost never used in practice of usability testing and the assessment of such an important indicator as user satisfaction with the usage of software product is usually performed in the form of unstructured surveys, i.e. without using standardized psychometric instruments. It should be noted that no validated and/or well-structured questionnaire instruments for the assessment of user satisfaction were suggested before in Russian language literature. Moreover, in international practice researchers often neglect using standardized techniques even though more than a dozen of them exist [Hornbæk, 2006]. Hence, the development of more inclusive and orderly usability evaluation methods and procedures (in our case, specialized for the evaluation of websites’ usability) which may be integrated into a unified testing methodology, is currently an especially important problem.

Rationale for a complex approach to the websites’ usability evaluation: objectives and hypotheses of the study

An original scheme of an experiment for complex assessment of websites’ usability was proposed and tested in this study. Besides the traditional usability metrics (success rates and time on task) it also included measures of:
– user satisfaction with the website assessed with the previously validated instrument “Assessment of User Satisfaction with a Website” (OPUS, see [Degtyarenko, Leonova, 2012]);
– “internal costs” of activity assessed using the (a) subjective comfort of current state, (b) efficiency of the strategies for cognitive load distribution, and (c) the usage of the subject’s psychophysiological resources.

The objective of the study was to experimentally justify the proposed complex approach to the assessment of websites' usability which can be successfully used in the usability testing setting. In parallel with this goal, the more diagnostically informative indicators were selected in each group of the methods used, based on the opportunities of their use for a valid and meaningful data interpretation.

To achieve this objective, a series of modeling experiments has been conducted. The efficiency of users’ activity was compared while performing identical tasks on two websites differing in their level of usability.

The main hypothesis tested in the experiment was an assumption that the indicators from the whole range of diagnostic techniques will be more favorable during work on a site with higher usability. These indicators include: (a) user satisfaction, (b) task performance, (c) subjective comfort of the current state and more efficient ways of actualization of cognitive and psychophysiological resources.


The experiment modeled the execution of several user tasks by the subjects on two websites of the same thematic category which differed in the degree of ergonomic quality of the interface. These tasks are typical during the process of preparing and conducting online surveys. Two websites freely available for Internet users were chosen: CreateSurvey and VirtualExS. These websites were chosen for the following reasons:
– these websites are used by a rather wide range of socionomic professionals: psychologists, sociologists, marketing and HR specialists, teachers etc.;
– the activity of conducting online surveys corresponded well to the content of educational programs of the university students majoring in psychology who were the subjects in our study;
– a preliminary expert evaluation conducted by the authors of this study has shown that the two websites vary in their usability; CreateSurvey website is generally characterized as having a simpler structure and more optimal visual design compared to a more information rich and somewhat “cluttered” VirtualExS;
– from a technological standpoint these sites are sufficiently reliable tools allowing their users to solve complex problems during sustained time periods; this allows for “ecologically valid” modeling of real work situations in the Internet environment.

All the experiment participants performed an identical set of tasks on each of the two websites. The data for the whole range of diagnostic techniques was gathered according to the prepared plan and procedure of the experiment (see below).


26 subjects took part in the study. They were psychology department students of The Lomonosov Moscow State University and Moscow State Linguistic University (16 females and 10 males aged from 19 to 22 years old). All the subjects had sufficient Internet usage experience (more than 2 years) and used it at least weekly, in most cases daily.

Plan and scheme of the experiment

Intra-individual experimental design was used. Each subject performed an identical task set twice during an experimental session: first on one website (series 1) and then on the other website (series 2). The sequence of work with the two websites varied: half of the subjects first used CreateSurvey website and then VirtualExS, and the other half did it in reverse order. All the participants took part in an introductory lesson prior to the experiment.

Built-in diagnostic measurements were conducted before and after each experimental series. Those included filling out questionnaires to test self assessment of the participant’s state and user satisfaction, and also an electrocardiographic (ECG) measurement. In addition, a synchronized video recording of the subject’s motor activity and eye movements was carried out during the task execution. An overview of the experimental procedure including the contents and sequence of the diagnostic measurements is presented in Fig. 1.

Fig. 1. Overview of the procedure and the contents of the main experiment stages.
Notes. Greek letter Delta (Δ 1 & Δ 2) denotes the change in the corresponding group of indicators during the corresponding experimental series.

Experimental stand

The experiment was technically implemented using a stand including two computers (one for the subject and one for the experimenter), remote eye tracker and a device for ECG recording and spectral analysis (see Fig. 2). The subject’s and experimenter’s computers ran Windows 7 and Windows XP operating systems, Google Chrome web browser, Techsmith Morae screen capturing software, TimeLeft stopwatch and TeamViewer software used for remote control from the experimenter’s PC.

High speed EyeLink 1000 eye tracker manufactured by SR Research (resolution – 500 Hz without head fixation) was used for remote recording of the participant’s eye movements. VNS-Spektr device from Neurosoft was used for the registration and spectral analysis of ECG. The recording was performed from 3 leads on the subject’s forearms during 3 minutes.

Fig. 2. General view of the experimental stand.
Notes. 1 – web camera for audio and video recording of the subject’s behavior, 2 – participant’s computer, 3 – EyeLink 1000 remote eye tracker, 4 – eye tracker controller, 5 – experimenter’s computer, 6 – VNS-Spektr vegetotester device used for ECG.

Introductory lesson

All subjects took part in introductory seminars in groups of 6 to 12 persons prior to the main part of the experiment. Basic steps for creating surveys in the Google Documents service were demonstrated and discussed. Then the students performed practice tasks similar to those presented during the experiment.

Experimental tasks

During the main part of the experiment a set of 7 tasks was presented to the subjects, which was identical for the two websites used:
1. Examine the functionality and structure of the system (introductory task).
2. Create a new survey in the system.
3. Add a welcome message which will be seen at the beginning of the survey.
4. Create a question with two answer choices, make it mandatory for the respondent to answer.
5. Add one more question which will be shown on a separate page of the survey.
6. Add 2 pictures to page 2 of the survey by uploading them from the Windows desktop.
7. Set up a logical rule so that the second page will be displayed or not displayed depending on the answer to the first question.

The success rate and time on task were recorded in the experimental protocol. The success rate was evaluated on a 4-point scale:
“4” – task accomplished without assistance in accordance with all specified criteria;
“3” – task accomplished in accordance with all specified criteria with use of instruction;
“2” – user performance met all specified criteria but the subject has not reported that he finished the task till the end of the time limit;
“1” – the subject did not finish the task.


Main stages of the experiment were executed in a completely computerized fashion. All the instructions, tasks and questionnaire materials were presented to the subjects in electronic form (using the LimeSurvey 1.90 web application). The introductory instruction briefly explained the experimental procedure, it did not include a special discussion of the scientific objectives of the study. After that, the subjects filled out questionnaires including individual biographic data.

Prior to the beginning of each experimental series, the experimenter remotely opened the proper website on the subject's computer. Then the participant read the task aloud. When he finished reading, the experimenter started a stopwatch on the participant's screen and the participant started working on the task. He had to report to the experimenter when he considered the task accomplished. The experimenter assessed the task completion using a set of criteria defined beforehand. If the user's performance satisfied all the criteria, his work on the task was stopped and the time on task was written down at this moment. If any criterion was not reached, the experimenter stated the reason why task cannot be considered accomplished.

The time for execution of the task was limited. Two time intervals were defined for each task. Their duration varied according to an expert evaluation of the task difficulty. After the first interval expired, the participant got right to use an instruction describing the process of task completion step by step. After the second interval expired the task was considered failed. In this case, the experimenter finished the task himself briefly explaining his actions to the subject.

Questionnaire instruments used

To characterize the subjective attitude of the subject to working on a specific web site and self-assessment of his current state, the following indicators were measured using specialized psychometric questionnaires:
1. user satisfaction with a website – using the validated OPUS instrument developed by the study authors [Degtyarenko, Leonova, 2012] supplemented by an auxiliary scale “Overall satisfaction”;
2. ease of task execution on the website – using a unipolar scale ranging from 1 to 5 points, filled after execution of each task (except the first introductory task, after which the participants assessed their “first impression of the site” on the same scale);
3. subjective assessment of the current state – using well-known standardized instruments:
– “States scale” questionnaire [Leonova, Kapitsa, 2003];
– short form of the Spielberger's State Anxiety Inventory [Leonova, Naumova, 2009].

Diagnostic indicators

The processing of data gathered in the experiment started with the calculation of informative indicators for each of the techniques included in the diagnostic complex.

A. To evaluate the success of task performance we have calculated the number of tasks performed with a certain success rate, i.e. not less than 4, 3, and 2 accordingly, independently for each website.

The following temporal characteristics were also used to characterize the work with each website: overall time for completion of all the tasks except the first (in minutes and seconds) and the time used to complete the first introductory task.

B. The level of user satisfaction with the work on a website measured using the main scales of the OPUS instrument: (1) “efficiency”, (2) “ease of use”, (3) “usefulness”, (4) “emotional appeal”. Initial scores on these scales were transferred to Z-values, normalized in accordance with the unified standardization sample (see [Degtyarenko, Leonova, 2012]). Similarly, the factor score was calculated to summarize the scores on the items of the auxiliary scale “Overall satisfaction”.

Average scores for all tasks on each website were also calculated using the “Ease of task execution” scale.

C. Subjective assessment of the current state was measured using the magnitude of shifts in the values of commonly used indicators (assessed using the questionnaires) from the beginning to the end of work with a certain website:
– “State scale” questionnaire – the difference in values of “Index of the Subjective Comfort” (ISC) were used;
– Spielberger's State Anxiety Inventory – the difference in values of a summarizing indicator “State Anxiety” (SA).

D. The effectiveness of cognitive load distribution was assessed using mean values of several oculomotor activity parameters, most commonly discussed in the specialized literature [Velichkovsky et al., 2010; Ahlstrom, Friedman-Berg, 2006; Grootjen et al., 2007]: (1) fixation duration, (2) frequency of fixations, (3) pupil diameter, (4) blink frequency, and (5) peak saccadic velocity. The values were averaged using the complete duration of work with each website.

E. The level of psychophysiological costs and signs of imbalance in the cardiovascular activity was evaluated using the standard set of indicators, obtained in ECG spectral analysis [Dmitrieva, Glazachev, 2000; Heart rate variability, 1996]. A 4 factor model previously developed by the study authors [Leonova, 2011] was used for integrated analysis of the initial data. This model included the following factors: (1) vegetative tension, (2) sympathetic mobilization, (3) parasympathetic imbalance, and (4) heart rate variability. From a qualitative point of view, these indicators characterize the dominant type of emotional and autonomic responses of the subjects to a situation.

Data analysis

The calculated values of all the discussed diagnostic indicators were compiled in a data matrix for further statistical analysis in the SPSS for Windows 19.0 software package. Wilcoxon’s T-test for related samples was used to identify significant differences between the indicator values obtained during work with different websites. In certain situations, it was supplemented with correlation (using Spearman’s rho) and cluster analysis.

Results and discussion

Results of the comparison between the characteristics of work with CreateSurvey and VirtualExS websites, obtained in the whole range of the techniques used, are summarized in Table 1. Wilcoxon’s T-criterion allowed to identify a large number of statistically significant differences between the two websites in nearly all groups of diagnostic indicators.

Table 1
Mean values of diagnostic indicators during execution of experimental tasks on different websites and the significance of differences between them

Indicators Averages for the two sites T/p
CreateSurvey VirtualExS
“Assessment of User Satisfaction with a Website” (OPUS) instrument scales
Effectiveness 0.24 –0.63 4.08***
Ease of use –0.33 –1.75 7.42***
Usefulness 0.07 –0.64 5.71***
Emotional appeal –0.11 –0.73 3.58***
Auxiliary indicators of subjective satisfaction
“Overall satisfaction” scale –0.13 –0.95 4.53***
Ease of the task execution 3.90 2.90 6.08***
First impression of the site 3.58 3.50 0.35
Success rates and time on tasks
Tasks with success rate = 4 4.08 1.92 7.25***
Tasks with success rate ≥ 3 5.46 3.73 5.55***
Tasks with success rate ≥ 2 5.77 4.15 6.34***
Overall time 18.26 28.29 -8.87***
Time on the introductory task 1.35 1.45 –1.31º
Changes in the subjective comfort of current state
Index of subjective comfort (ISC) +0.54 –2.38 2.72**
State anxiety (SA) –0.50 +0.38 –0.90
Oculomotor activity characteristics
Mean fixation duration (ms) 262 254 2.37*
Fixations per minute 180.5 175.2 1.02
Blinks per minute 13.3 12.1 2.44*
Mean pupil diameter 276 261 1.73º
Peak saccadic velocity 255 248 0.72
Changes in the structure of vegetative tension indicators
Vegetative tension –0.64 –1.29 1.28
Sympathetic mobilization –0.02 –0.121 0.37
Parasympathetic imbalance –0.11 +0.019 –0.48
Heart rate variability +0.02 +0.235 –0.95
Heart rate –5.96 –8.62 1.20

Notes. *** p ≤ 0.001; ** p ≤ 0.01; * p ≤ 0.05; º p ≤ 0.1.

Below is the detailed analysis of the differences found. For more compact presentation the more usable website CreateSurvey will be designated as the U-site and the less usable VirtualExS as the NU-site.


Differences in user satisfaction values for the work with different websites

The more usable U-site got higher scores on absolute majority of the user satisfaction metrics compared to the less usable NU-site with very high level of significance (p < 0.001). The differences were shown for all of the OPUS instrument scales. The data for auxiliary scales “Overall satisfaction” and “Ease of task execution” also supported this result. The U-site scored higher on both of them (p < 0.001). This indicates that the psychometric instruments used in the study are suitable for a sufficiently clear and differentiated assessment of various aspects of the websites’ usability. The only exception was the “First impression of the site” scale (estimates for this one were given immediately after the first, introductory task). This indicates that the subjective assessment of user satisfaction forms gradually with the expansion of user’s experience with the site.

Differences in success rates and temporal characteristics of task execution

The values of most performance and temporal metrics for task execution on each of the two websites also significantly differed (p < 0.001). The number of successfully performed tasks, quality and speed of their execution were larger for the U-site. The only indicator with smaller difference was the time used for the introductory task. However, in this case there also was a tendency to a reduced time when working with the U-site (p < 0.1) – perhaps, due to its more compact design and lesser information richness as compared to the NU-site.

Differences in the dynamics of subjective assessments of current state

The changes of the self-assessments of current state which happened during the whole period of work with each of the websites differ significantly in case of the “Index of subjective comfort” indicator (ISC, p < 0.01). Not only the magnitudes of these changes differed for the two websites, but also the direction of shifts in current state comfort during work with the two websites. The ISC score increased compared to the baseline values after the completion of work with the U-site being in the range of optimal values. To the contrary, it dropped sharply during work with the NU-site and shifted into the zone of pronounced disadvantage. No significant differences were found in the dynamics of the State Anxiety index (SA) during work with the two websites. However the absolute value of state anxiety grew after task execution on the NU-site and shifted to the range of higher values.

Differences in oculomotor activity indicators

Significant differences were found in a number of oculomotor activity indicators recorded during task execution on the two different websites. Mean fixation duration and blink frequency (p < 0.05) were higher during work with the U-site. Mean pupil diameter also tended to increase in this case (p < 0.1).

According to the literature, the growth of these indicators may be associated with higher levels of cognitive load, but it does not necessarily indicate that the strategies used for performing mental acts were less efficient [Velichkovskii et al., 2010; Ahlstrom, Friedman-Berg, 2006; Grootjen et al., 2007]. In particular, longer fixations (300 ms or more) are usually associated with an increase in pupil diameter and are preceded and followed by relatively small amplitude saccades. According to B.M.Velichkovskii’s opinion [2006], this pattern of execution of long fixations reflects the actualization of focal information processing, which is aimed at goal-oriented identification of objects and events. In contrast, the prevalence of shorter fixations suggests a more important role of processes of attention spatial distribution in the field of perception, and less orderly strategies of visual search.

According to this interpretation, we can discuss the meaning of the specific patterns of oculomotor indicators recorded during work with the two different websites. Work with the U-site is characterized by the dominance of focal information processing, aimed at orderly selection of zones in the visual field for the following cognitive analysis of the chosen user interface fragments. Orderly strategies of this kind is problematic during work with the NU-site because of the need for frequent attention switching to search for relevant information in a noisy visual field.

The differences in blink frequency, which is higher during work with the U-site (p < 0.05),deserve a separate discussion. Conflicting opinions are present in the literature discussing the interpretation of this indicator. On the one hand, J.B.Brookings et al. (cited according to [Di Nocera et al., 2007]) report a negative correlation between blink frequency and cognitive load. On the other hand, K.Takahashi and colleagues [Takahashi et al., 2000] argue that blink frequency grows with the level of cognitive load. Higher blink frequency can also be attributed to fatigue, or interpreted as an indicator of a more goal-oriented realization of sensory activities. In the latter case, the act of blinking finalizes the execution of a single integral phase of action and prepares the eyes for the beginning of the next one [Kahneman, 1973]. Development of a pronounced fatigue in our study was unlikely because the duration of the experiment was relatively small. We can therefore assume that the increase in blink frequency during work with the U-site indicates the presence of well-structured transitions from one phase of cognitive action to the following one and more efficient strategies for distribution of cognitive load compared to the NU-site.

Differences in types of vegetative emotional responses to task execution on two different websites

The analysis of data averaged for the entire sample did not reveal significant differences between websites on any of the generalized indicators of vegetative tension (see Table 1). The reason for this could be the high individual variability in the psychophysiological mechanisms, which is often mentioned in the specialized literature [Dmitrieva, Glazachev, 2000; Leonova, 2011; Thayer, Lane, 2000]. To account for the influence of this factor a quick cluster analysis was performed on the entire sample. Two subgroups of subjects were defined as a result. Clusters # 1 and # 2 differed by the type of vegetative emotional response to the situation.

Psychophysiological status of the cluster # 1 (18 people) is characterized by higher level of sympathetic mobilization without evidence of parasympathetic imbalance. Members of cluster # 2 (8 people) typically had lower levels of overall vegetative tension, but tended to show pronounced signs of parasympathetic imbalance. The test for significant differences in the generalized cardiac performance indicators during work with the two different websites was carried out independently for each cluster of subjects (using the Wilcoxon T test). Characteristic patterns of changes in vegetative emotional response to the situation were identified, and they were different for the two clusters.

Cluster # 1 (see Fig. 3). An increase in sympathetic mobilization (p < 0.05), which indicated a more pronounced influence of central mechanisms on physiological resource activation, was characteristic for this subgroup during work with the U-site. The heart rate variability also decreased (p < 0.01), indicating a more stable functioning of the autonomic mechanisms of activity regulation. In contrast, when subjects used the NU-site, sympathetic mobilization tended to decrease (p < 0.1) with a simultaneous increase in heart rate variability (p < 0.05). In other words, during work with the U-site the activation of psychophysiological resources was performed according to an optimal scenario close to the state of “operational tension”[2], while work with the NU-site caused signs of imbalanced resource mobilization characteristic for the states of increased emotional tension of the “impulsive type”.

Fig. 3. Cluster # 1 (N=18). Shifts in the values of vegetative emotional response indicators during task execution on each of the two websites.

Cluster # 2 (see Fig. 4). Signs of parasympathetic imbalance and heart rate variability tended to decrease (p < 0.1) in representatives of this subgroup during work with the U-site, which indicated the optimal state of “operational tension”. At the same time, these indicators significantly increased during work with the NU-site (p < 0.05). Higher parasympathetic imbalance led to sluggishness and constrained reactions. This type of ineffective vegetative response to the situation can be described as an “inhibitory type” of high emotional tension.

Fig. 4. Cluster # 2 (N=8). Shifts in the values of vegetative emotional response indicators during task execution on each of the two websites.

Comparison of the data on individualized vegetative emotional response types show that in both subgroups of subjects the work with a more usable website was accompanied by more adequate mechanisms of psychophysiological resources activation according to the optimal state of “operational tension”. Task execution using a less usable website led to destructive changes in the course of regulatory processes, i.e. a gradual increase in vegetative emotional tension. Individual differences between subjects manifested as different types of response (“impulsive” or “inhibitory”) to difficulties during work on a website. However, regardless of the response type, the activity on the less usable site was more taxing in terms of the internal resources. This fact is an important consequence of the reduced efficiency of users’ activity [Leonova, 2007].


Results of the experimental study demonstrate the systematic character of differences across the whole range of metrics recorded during execution of identical tasks on two websites with varying levels of usability. According to the logic of the developed integrated approach these differences correspond to the following criteria necessary for a differentiated websites' usability assessment:

1. User satisfaction with their activity on websites (motivational component) measured using the main scales of OPUS instrument supplemented by auxiliary scales “Overall satisfaction” and “Ease of task execution”. All these indicators were significantly higher for the more usable website.

2. Effectiveness and productivity (performance component) assessed according to the objective indicators of success (number of completed tasks and quality of task execution) and time on tasks. Work with the more usable website was almost 40% more successful on average (with 53% higher number of tasks completed without the use of instructions). The time on task has reduced by more than 35%.

3. “Internal cost of activity” (resource component), defined by:
– subjective assessment of current state. The subjects experienced a consistently high level of comfort of the current state during the whole period of task execution on the more usable website;
– objective indicators of the cognitive load distribution efficiency. According to the remote eye tracking data, task execution on the more usable website was carried out using goal-oriented and streamlined strategies of information processing, as opposed to the dominating strategies of free visual search during work with the less usable website;
– psychophysiological indicators defining the type of vegetative emotional response to the situation and generalizing the results of the ECG spectral analysis. Task execution on the more usable website was accompanied by appropriate physiological resource mobilization and the formation of optimal state of “operational tension”. Destructive states of emotional tension (of impulsive or inhibitory type) developed during work with the less usable website.

These results substantiate the constructive complex approach to the assessment of websites’ usability and the prospects for its future development. Its implementation allows us to go beyond the boundaries of narrowly specialized techniques and procedures used in usability engineering practice which measure a limited set of success and overall user satisfaction indicators. This approach allows us to integrate a wide range of parameters simultaneously describing user activity at different levels including both the productivity of task execution and the adequacy of “internal cost of activity” which is one of the main criteria of its efficiency.

The combination of subjective and objective methods of data collection and analysis allows us to operationalize the concept of mental and executive actions’ structure. This, in turn, reveals “deficit zones” in the work experience with usage of different websites. That is, in our opinion, the main focus of a software product usability assessment. The rapid development of modern technologies and the emergence of new compact measurement devices offer new opportunities to create more sophisticated tools enabling simultaneous registration of objective and subjective indicators and easy access to real time quantitative and qualitative analysis of multivariate data. Thus, the results of our study will be useful for further in-depth scientific research and development of tools which can be applied in practice oriented usability assessment projects.

The study was supported by the Russian Foundation for Basic Research, project 11-06-00463-a.

Translated by I.Degtyarenko

Cyrillic letters are transliterated according to BSI standards. The titles are given in author’s translation.

Ahlstrom U., Friedman-Berg F.J. Using eye movement activity as a correlate of cognitive workload. International Journal of Industrial Ergonomics, 2006, 36(7), 623–636.

Degtyarenko I.A., Leonova A.B. Score user satisfaction with the work of a website. Natsional'nyi psikhologicheskii zhurnal, 2012, No. 1, 95–103. (in Russian)

Di Nocera F., Camilli M., Terenzi M.A. Random glance at the flight deck: Pilots’ scanning strategies and the real-time assessment of mental workload. Journal of Cognitive Engineering and Decision Making, 2007, 1(3), 271–285.

Dmitrieva N.V., Glazachev O.S. Individual health and polyparametric diagnostics of human functional state. Moscow: Gorizont, 2000. (in Russian)

Golovach V.V. Usability testing on the cheap. Moscow, 2005. (in Russian)

Grootjen M., Neerincx M.A., van Weert J.C.M., Truong K.P. Measuring cognitive task load on a naval ship: Implications of a real world environment. In: Augmented Cognition. Pittsburgh, PA: HCII, 2007. pp. 147–156.

Heart rate variability: Standards of measurement, physiological interpretation, and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. Circulation, 1996, 93(5), 1043–1065.

Hornbæk K. Current practice in measuring usability: Challenges to usability studies and research. International Journal of Man-Machine Studies, 2006, 64(2), 79–102.

ISO 9241-11:1998, Ergonomic requirements for office work with visual display terminals (VDTs) – Part 11: Guidance on usability.

Kahneman D. Attention and effort. Ann Arbor, MI: Prentice-Hall, 1973.

Leonova A.B. Human reliability and stress management technologies in modern occupations. Vestnik Moskovskogo universiteta. Ser. 14, Psikhologiya, 2007, No. 2, 76–85. (in Russian)

Leonova A.B. In: Psychology of self-regulation in the XXI century. Moscow, Saint-Petersburgh: Nestor-Istoriya, 2011. pp. 354–375. (in Russian)

Leonova A.B., Kapitsa M.S. In: Workshop on engineering psychology and ergonomics. Moscow: Akademiya, 2003. pp. 134–166. (in Russian)

Leonova A.B., Naumova N.N. In: Proceedings of the Academy of Medical Sciences Scientific Council on Experimental and Applied Physiology (Whole Issue). Moscow, 2009. (in Russian)

Naenko N.I. Mental Tension. Moscow: Mosk. gos. universitet, 1976. (in Russian)

Nielsen J. Usability engineering. San Francisco: Morgan Kaufmann, 1993.

Takahashi K., Nakayama M., Shimizu Y. The response of eye-movement and pupil size to audio instruction while viewing a moving target. In: Proceedings of the 2000 symposium on Eye tracking research and applications (ETRA '00). New York, NY: ACM, 2000. pp. 131–138.

Velichkovskii B.B., Zlokazova T.A., Kapitsa M.S. The efficiency of interruption handling in free and forced switching tasks. Eksperimental'naya psikhologiya, 2010, No. 2, 45–47. (in Russian)

Velichkovskii B.M. Cognitive science: Foundations of the psychology of cognition. Moscow: Smysl: Akademiya, 2006. (in Russian)

Thayer J.F., Lane R.D. A model of neurovisceral integration in emotion regulation and dysregulation. Journal of Affect Disorders, 2000, 61(3), 201–216.


[1] It should be noted that the term “usability” is largely synonymous to “ergonomic quality” with the only difference being that the emphasis in its application is put on the assessment of mental effort and ease of use of a certain tool (a software product in our case) and not only on the appearance and aesthetic design of the equipment.

[2] Here and below, the different types of mental tension are described according to the terminology used in a classical work by N.I.Naenko [1976]. 

Received 12 November 2011. Date of publication: 25 April 2012.

About authors

Degtyarenko Ivan A. Ph.D. Student, Department of Psychology of Work and Engineering Psychology, Faculty of Psychology, Lomonosov Moscow State University, ul. Mokhovaya, 11, str. 9, 125009 Moscow, Russia.
E-mail: Этот адрес электронной почты защищен от спам-ботов. У вас должен быть включен JavaScript для просмотра.

Leonova Anna B. Ph.D., Professor, Head, Laboratory of Labour Psychology, Faculty of Psychology, Lomonosov Moscow State University, ul. Mokhovaya, 11, str. 9, 125009 Moscow, Russia.
E-mail: Этот адрес электронной почты защищен от спам-ботов. У вас должен быть включен JavaScript для просмотра.

Suggested citation

Degtyarenko I.A., Leonova A.B. Experimental elaboration of the complex approach to websites usability evaluation. Psikhologicheskie Issledovaniya, 2012, No. 2(22), p. 6.

Permanent URL:

Back to top >>