Drizzled Public API Documentation

sync0sync.cc
1 /*****************************************************************************
2 
3 Copyright (C) 1995, 2010, Innobase Oy. All Rights Reserved.
4 Copyright (C) 2008, Google Inc.
5 
6 Portions of this file contain modifications contributed and copyrighted by
7 Google, Inc. Those modifications are gratefully acknowledged and are described
8 briefly in the InnoDB documentation. The contributions by Google are
9 incorporated with their permission, and subject to the conditions contained in
10 the file COPYING.Google.
11 
12 This program is free software; you can redistribute it and/or modify it under
13 the terms of the GNU General Public License as published by the Free Software
14 Foundation; version 2 of the License.
15 
16 This program is distributed in the hope that it will be useful, but WITHOUT
17 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
18 FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
19 
20 You should have received a copy of the GNU General Public License along with
21 this program; if not, write to the Free Software Foundation, Inc., 51 Franklin
22 St, Fifth Floor, Boston, MA 02110-1301 USA
23 
24 *****************************************************************************/
25 
26 /**************************************************/
33 #include "sync0sync.h"
34 #ifdef UNIV_NONINL
35 #include "sync0sync.ic"
36 #endif
37 
38 #include "sync0rw.h"
39 #include "buf0buf.h"
40 #include "srv0srv.h"
41 #include "buf0types.h"
42 #include "os0sync.h" /* for HAVE_ATOMIC_BUILTINS */
43 #ifdef UNIV_SYNC_DEBUG
44 # include "srv0start.h" /* srv_is_being_started */
45 #endif /* UNIV_SYNC_DEBUG */
46 
47 /*
48  REASONS FOR IMPLEMENTING THE SPIN LOCK MUTEX
49  ============================================
50 
51 Semaphore operations in operating systems are slow: Solaris on a 1993 Sparc
52 takes 3 microseconds (us) for a lock-unlock pair and Windows NT on a 1995
53 Pentium takes 20 microseconds for a lock-unlock pair. Therefore, we have to
54 implement our own efficient spin lock mutex. Future operating systems may
55 provide efficient spin locks, but we cannot count on that.
56 
57 Another reason for implementing a spin lock is that on multiprocessor systems
58 it can be more efficient for a processor to run a loop waiting for the
59 semaphore to be released than to switch to a different thread. A thread switch
60 takes 25 us on both platforms mentioned above. See Gray and Reuter's book
61 Transaction processing for background.
62 
63 How long should the spin loop last before suspending the thread? On a
64 uniprocessor, spinning does not help at all, because if the thread owning the
65 mutex is not executing, it cannot be released. Spinning actually wastes
66 resources.
67 
68 On a multiprocessor, we do not know if the thread owning the mutex is
69 executing or not. Thus it would make sense to spin as long as the operation
70 guarded by the mutex would typically last assuming that the thread is
71 executing. If the mutex is not released by that time, we may assume that the
72 thread owning the mutex is not executing and suspend the waiting thread.
73 
74 A typical operation (where no i/o involved) guarded by a mutex or a read-write
75 lock may last 1 - 20 us on the current Pentium platform. The longest
76 operations are the binary searches on an index node.
77 
78 We conclude that the best choice is to set the spin time at 20 us. Then the
79 system should work well on a multiprocessor. On a uniprocessor we have to
80 make sure that thread swithches due to mutex collisions are not frequent,
81 i.e., they do not happen every 100 us or so, because that wastes too much
82 resources. If the thread switches are not frequent, the 20 us wasted in spin
83 loop is not too much.
84 
85 Empirical studies on the effect of spin time should be done for different
86 platforms.
87 
88 
89  IMPLEMENTATION OF THE MUTEX
90  ===========================
91 
92 For background, see Curt Schimmel's book on Unix implementation on modern
93 architectures. The key points in the implementation are atomicity and
94 serialization of memory accesses. The test-and-set instruction (XCHG in
95 Pentium) must be atomic. As new processors may have weak memory models, also
96 serialization of memory references may be necessary. The successor of Pentium,
97 P6, has at least one mode where the memory model is weak. As far as we know,
98 in Pentium all memory accesses are serialized in the program order and we do
99 not have to worry about the memory model. On other processors there are
100 special machine instructions called a fence, memory barrier, or storage
101 barrier (STBAR in Sparc), which can be used to serialize the memory accesses
102 to happen in program order relative to the fence instruction.
103 
104 Leslie Lamport has devised a "bakery algorithm" to implement a mutex without
105 the atomic test-and-set, but his algorithm should be modified for weak memory
106 models. We do not use Lamport's algorithm, because we guess it is slower than
107 the atomic test-and-set.
108 
109 Our mutex implementation works as follows: After that we perform the atomic
110 test-and-set instruction on the memory word. If the test returns zero, we
111 know we got the lock first. If the test returns not zero, some other thread
112 was quicker and got the lock: then we spin in a loop reading the memory word,
113 waiting it to become zero. It is wise to just read the word in the loop, not
114 perform numerous test-and-set instructions, because they generate memory
115 traffic between the cache and the main memory. The read loop can just access
116 the cache, saving bus bandwidth.
117 
118 If we cannot acquire the mutex lock in the specified time, we reserve a cell
119 in the wait array, set the waiters byte in the mutex to 1. To avoid a race
120 condition, after setting the waiters byte and before suspending the waiting
121 thread, we still have to check that the mutex is reserved, because it may
122 have happened that the thread which was holding the mutex has just released
123 it and did not see the waiters byte set to 1, a case which would lead the
124 other thread to an infinite wait.
125 
126 LEMMA 1: After a thread resets the event of a mutex (or rw_lock), some
127 =======
128 thread will eventually call os_event_set() on that particular event.
129 Thus no infinite wait is possible in this case.
130 
131 Proof: After making the reservation the thread sets the waiters field in the
132 mutex to 1. Then it checks that the mutex is still reserved by some thread,
133 or it reserves the mutex for itself. In any case, some thread (which may be
134 also some earlier thread, not necessarily the one currently holding the mutex)
135 will set the waiters field to 0 in mutex_exit, and then call
136 os_event_set() with the mutex as an argument.
137 Q.E.D.
138 
139 LEMMA 2: If an os_event_set() call is made after some thread has called
140 =======
141 the os_event_reset() and before it starts wait on that event, the call
142 will not be lost to the second thread. This is true even if there is an
143 intervening call to os_event_reset() by another thread.
144 Thus no infinite wait is possible in this case.
145 
146 Proof (non-windows platforms): os_event_reset() returns a monotonically
147 increasing value of signal_count. This value is increased at every
148 call of os_event_set() If thread A has called os_event_reset() followed
149 by thread B calling os_event_set() and then some other thread C calling
150 os_event_reset(), the is_set flag of the event will be set to FALSE;
151 but now if thread A calls os_event_wait_low() with the signal_count
152 value returned from the earlier call of os_event_reset(), it will
153 return immediately without waiting.
154 Q.E.D.
155 
156 Proof (windows): If there is a writer thread which is forced to wait for
157 the lock, it may be able to set the state of rw_lock to RW_LOCK_WAIT_EX
158 The design of rw_lock ensures that there is one and only one thread
159 that is able to change the state to RW_LOCK_WAIT_EX and this thread is
160 guaranteed to acquire the lock after it is released by the current
161 holders and before any other waiter gets the lock.
162 On windows this thread waits on a separate event i.e.: wait_ex_event.
163 Since only one thread can wait on this event there is no chance
164 of this event getting reset before the writer starts wait on it.
165 Therefore, this thread is guaranteed to catch the os_set_event()
166 signalled unconditionally at the release of the lock.
167 Q.E.D. */
168 
169 /* Number of spin waits on mutexes: for performance monitoring */
170 
173 static ib_int64_t mutex_spin_round_count = 0;
176 static ib_int64_t mutex_spin_wait_count = 0;
179 static ib_int64_t mutex_os_wait_count = 0;
182 UNIV_INTERN ib_int64_t mutex_exit_count = 0;
183 
187 
189 UNIV_INTERN ibool sync_initialized = FALSE;
190 
192 typedef struct sync_level_struct sync_priority_t;
194 typedef struct sync_thread_struct sync_thread_t;
195 
196 #ifdef UNIV_SYNC_DEBUG
197 
200 UNIV_INTERN sync_thread_t* sync_thread_level_arrays;
201 
203 UNIV_INTERN mutex_t sync_thread_mutex;
204 
205 # ifdef UNIV_PFS_MUTEX
206 UNIV_INTERN mysql_pfs_key_t sync_thread_mutex_key;
207 # endif /* UNIV_PFS_MUTEX */
208 #endif /* UNIV_SYNC_DEBUG */
209 
211 UNIV_INTERN ut_list_base_node_t mutex_list;
212 
215 
216 #ifdef UNIV_PFS_MUTEX
217 UNIV_INTERN mysql_pfs_key_t mutex_list_mutex_key;
218 #endif /* UNIV_PFS_MUTEX */
219 
220 #ifdef UNIV_SYNC_DEBUG
221 
222 UNIV_INTERN ibool sync_order_checks_on = FALSE;
223 #endif /* UNIV_SYNC_DEBUG */
224 
230 };
231 
233 #define SYNC_THREAD_N_LEVELS 10000
234 
237  void* latch;
239  ulint level;
240 };
241 
242 /******************************************************************/
247 UNIV_INTERN
248 void
250 /*==============*/
251  mutex_t* mutex,
252 #ifdef UNIV_DEBUG
253  const char* cmutex_name,
254 # ifdef UNIV_SYNC_DEBUG
255  ulint level,
256 # endif /* UNIV_SYNC_DEBUG */
257 #endif /* UNIV_DEBUG */
258  const char* cfile_name,
259  ulint cline)
260 {
261 #if defined(HAVE_ATOMIC_BUILTINS)
262  mutex_reset_lock_word(mutex);
263 #else
265  mutex->lock_word = 0;
266 #endif
267  mutex->event = os_event_create(NULL);
268  mutex_set_waiters(mutex, 0);
269 #ifdef UNIV_DEBUG
270  mutex->magic_n = MUTEX_MAGIC_N;
271 #endif /* UNIV_DEBUG */
272 #ifdef UNIV_SYNC_DEBUG
273  mutex->line = 0;
274  mutex->file_name = "not yet reserved";
275  mutex->level = level;
276 #endif /* UNIV_SYNC_DEBUG */
277  mutex->cfile_name = cfile_name;
278  mutex->cline = cline;
279  mutex->count_os_wait = 0;
280 #ifdef UNIV_DEBUG
281  mutex->cmutex_name= cmutex_name;
282  mutex->count_using= 0;
283  mutex->mutex_type= 0;
284  mutex->lspent_time= 0;
285  mutex->lmax_spent_time= 0;
286  mutex->count_spin_loop= 0;
287  mutex->count_spin_rounds= 0;
288  mutex->count_os_yield= 0;
289 #endif /* UNIV_DEBUG */
290 
291  /* Check that lock_word is aligned; this is important on Intel */
292  ut_ad(((ulint)(&(mutex->lock_word))) % 4 == 0);
293 
294  /* NOTE! The very first mutexes are not put to the mutex list */
295 
296  if ((mutex == &mutex_list_mutex)
297 #ifdef UNIV_SYNC_DEBUG
298  || (mutex == &sync_thread_mutex)
299 #endif /* UNIV_SYNC_DEBUG */
300  ) {
301 
302  return;
303  }
304 
305  mutex_enter(&mutex_list_mutex);
306 
308  || UT_LIST_GET_FIRST(mutex_list)->magic_n == MUTEX_MAGIC_N);
309 
310  UT_LIST_ADD_FIRST(list, mutex_list, mutex);
311 
312  mutex_exit(&mutex_list_mutex);
313 }
314 
315 /******************************************************************/
320 UNIV_INTERN
321 void
323 /*============*/
324  mutex_t* mutex)
325 {
326  ut_ad(mutex_validate(mutex));
327  ut_a(mutex_get_lock_word(mutex) == 0);
328  ut_a(mutex_get_waiters(mutex) == 0);
329 
330 #ifdef UNIV_MEM_DEBUG
331  if (mutex == &mem_hash_mutex) {
333  ut_ad(UT_LIST_GET_FIRST(mutex_list) == &mem_hash_mutex);
334  UT_LIST_REMOVE(list, mutex_list, mutex);
335  goto func_exit;
336  }
337 #endif /* UNIV_MEM_DEBUG */
338 
339  if (mutex != &mutex_list_mutex
340 #ifdef UNIV_SYNC_DEBUG
341  && mutex != &sync_thread_mutex
342 #endif /* UNIV_SYNC_DEBUG */
343  ) {
344 
345  mutex_enter(&mutex_list_mutex);
346 
347  ut_ad(!UT_LIST_GET_PREV(list, mutex)
348  || UT_LIST_GET_PREV(list, mutex)->magic_n
349  == MUTEX_MAGIC_N);
350  ut_ad(!UT_LIST_GET_NEXT(list, mutex)
351  || UT_LIST_GET_NEXT(list, mutex)->magic_n
352  == MUTEX_MAGIC_N);
353 
354  UT_LIST_REMOVE(list, mutex_list, mutex);
355 
356  mutex_exit(&mutex_list_mutex);
357  }
358 
359  os_event_free(mutex->event);
360 #ifdef UNIV_MEM_DEBUG
361 func_exit:
362 #endif /* UNIV_MEM_DEBUG */
363 #if !defined(HAVE_ATOMIC_BUILTINS)
365 #endif
366  /* If we free the mutex protecting the mutex list (freeing is
367  not necessary), we have to reset the magic number AFTER removing
368  it from the list. */
369 #ifdef UNIV_DEBUG
370  mutex->magic_n = 0;
371 #endif /* UNIV_DEBUG */
372 }
373 
374 /********************************************************************/
379 UNIV_INTERN
380 ulint
382 /*====================*/
383  mutex_t* mutex,
384  const char* /*file_name __attribute__((unused))*/,
387  ulint /*line __attribute__((unused))*/)
389 {
390  ut_ad(mutex_validate(mutex));
391 
392  if (!mutex_test_and_set(mutex)) {
393 
394  ut_d(mutex->thread_id = os_thread_get_curr_id());
395 #ifdef UNIV_SYNC_DEBUG
396  mutex_set_debug_info(mutex, file_name, line);
397 #endif
398 
399  return(0); /* Succeeded! */
400  }
401 
402  return(1);
403 }
404 
405 #ifdef UNIV_DEBUG
406 /******************************************************************/
409 UNIV_INTERN
410 ibool
411 mutex_validate(
412 /*===========*/
413  const mutex_t* mutex)
414 {
415  ut_a(mutex);
416  ut_a(mutex->magic_n == MUTEX_MAGIC_N);
417 
418  return(TRUE);
419 }
420 
421 /******************************************************************/
425 UNIV_INTERN
426 ibool
427 mutex_own(
428 /*======*/
429  const mutex_t* mutex)
430 {
431  ut_ad(mutex_validate(mutex));
432 
433  return(mutex_get_lock_word(mutex) == 1
434  && os_thread_eq(mutex->thread_id, os_thread_get_curr_id()));
435 }
436 #endif /* UNIV_DEBUG */
437 
438 /******************************************************************/
440 UNIV_INTERN
441 void
442 mutex_set_waiters(
443 /*==============*/
444  mutex_t* mutex,
445  ulint n)
446 {
447  volatile ulint* ptr; /* declared volatile to ensure that
448  the value is stored to memory */
449  ut_ad(mutex);
450 
451  ptr = &(mutex->waiters);
452 
453  *ptr = n; /* Here we assume that the write of a single
454  word in memory is atomic */
455 }
456 
457 /******************************************************************/
461 UNIV_INTERN
462 void
463 mutex_spin_wait(
464 /*============*/
465  mutex_t* mutex,
466  const char* file_name,
468  ulint line)
469 {
470  ulint index; /* index of the reserved wait cell */
471  ulint i; /* spin round count */
472 #ifdef UNIV_DEBUG
473  ib_int64_t lstart_time = 0, lfinish_time; /* for timing os_wait */
474  ulint ltime_diff;
475  ulint sec;
476  ulint ms;
477  uint timer_started = 0;
478 #endif /* UNIV_DEBUG */
479  ut_ad(mutex);
480 
481  /* This update is not thread safe, but we don't mind if the count
482  isn't exact. Moved out of ifdef that follows because we are willing
483  to sacrifice the cost of counting this as the data is valuable.
484  Count the number of calls to mutex_spin_wait. */
485  mutex_spin_wait_count++;
486 
487 mutex_loop:
488 
489  i = 0;
490 
491  /* Spin waiting for the lock word to become zero. Note that we do
492  not have to assume that the read access to the lock word is atomic,
493  as the actual locking is always committed with atomic test-and-set.
494  In reality, however, all processors probably have an atomic read of
495  a memory word. */
496 
497 spin_loop:
498  ut_d(mutex->count_spin_loop++);
499 
500  while (mutex_get_lock_word(mutex) != 0 && i < SYNC_SPIN_ROUNDS) {
501  if (srv_spin_wait_delay) {
502  ut_delay(ut_rnd_interval(0, srv_spin_wait_delay));
503  }
504 
505  i++;
506  }
507 
508  if (i == SYNC_SPIN_ROUNDS) {
509 #ifdef UNIV_DEBUG
510  mutex->count_os_yield++;
511 #ifndef UNIV_HOTBACKUP
512  if (timed_mutexes && timer_started == 0) {
513  ut_usectime(&sec, &ms);
514  lstart_time= (ib_int64_t)sec * 1000000 + ms;
515  timer_started = 1;
516  }
517 #endif /* UNIV_HOTBACKUP */
518 #endif /* UNIV_DEBUG */
519  os_thread_yield();
520  }
521 
522 #ifdef UNIV_SRV_PRINT_LATCH_WAITS
523  fprintf(stderr,
524  "Thread %lu spin wait mutex at %p"
525  " cfile %s cline %lu rnds %lu\n",
526  (ulong) os_thread_pf(os_thread_get_curr_id()), (void*) mutex,
527  mutex->cfile_name, (ulong) mutex->cline, (ulong) i);
528 #endif
529 
530  mutex_spin_round_count += i;
531 
532  ut_d(mutex->count_spin_rounds += i);
533 
534  if (mutex_test_and_set(mutex) == 0) {
535  /* Succeeded! */
536 
537  ut_d(mutex->thread_id = os_thread_get_curr_id());
538 #ifdef UNIV_SYNC_DEBUG
539  mutex_set_debug_info(mutex, file_name, line);
540 #endif
541 
542  goto finish_timing;
543  }
544 
545  /* We may end up with a situation where lock_word is 0 but the OS
546  fast mutex is still reserved. On FreeBSD the OS does not seem to
547  schedule a thread which is constantly calling pthread_mutex_trylock
548  (in mutex_test_and_set implementation). Then we could end up
549  spinning here indefinitely. The following 'i++' stops this infinite
550  spin. */
551 
552  i++;
553 
554  if (i < SYNC_SPIN_ROUNDS) {
555  goto spin_loop;
556  }
557 
558  sync_array_reserve_cell(sync_primary_wait_array, mutex,
559  SYNC_MUTEX, file_name, line, &index);
560 
561  /* The memory order of the array reservation and the change in the
562  waiters field is important: when we suspend a thread, we first
563  reserve the cell and then set waiters field to 1. When threads are
564  released in mutex_exit, the waiters field is first set to zero and
565  then the event is set to the signaled state. */
566 
567  mutex_set_waiters(mutex, 1);
568 
569  /* Try to reserve still a few times */
570  for (i = 0; i < 4; i++) {
571  if (mutex_test_and_set(mutex) == 0) {
572  /* Succeeded! Free the reserved wait cell */
573 
574  sync_array_free_cell(sync_primary_wait_array, index);
575 
576  ut_d(mutex->thread_id = os_thread_get_curr_id());
577 #ifdef UNIV_SYNC_DEBUG
578  mutex_set_debug_info(mutex, file_name, line);
579 #endif
580 
581 #ifdef UNIV_SRV_PRINT_LATCH_WAITS
582  fprintf(stderr, "Thread %lu spin wait succeeds at 2:"
583  " mutex at %p\n",
585  (void*) mutex);
586 #endif
587 
588  goto finish_timing;
589 
590  /* Note that in this case we leave the waiters field
591  set to 1. We cannot reset it to zero, as we do not
592  know if there are other waiters. */
593  }
594  }
595 
596  /* Now we know that there has been some thread holding the mutex
597  after the change in the wait array and the waiters field was made.
598  Now there is no risk of infinite wait on the event. */
599 
600 #ifdef UNIV_SRV_PRINT_LATCH_WAITS
601  fprintf(stderr,
602  "Thread %lu OS wait mutex at %p cfile %s cline %lu rnds %lu\n",
603  (ulong) os_thread_pf(os_thread_get_curr_id()), (void*) mutex,
604  mutex->cfile_name, (ulong) mutex->cline, (ulong) i);
605 #endif
606 
607  mutex_os_wait_count++;
608 
609  mutex->count_os_wait++;
610 #ifdef UNIV_DEBUG
611  /* !!!!! Sometimes os_wait can be called without os_thread_yield */
612 #ifndef UNIV_HOTBACKUP
613  if (timed_mutexes == 1 && timer_started == 0) {
614  ut_usectime(&sec, &ms);
615  lstart_time= (ib_int64_t)sec * 1000000 + ms;
616  timer_started = 1;
617  }
618 #endif /* UNIV_HOTBACKUP */
619 #endif /* UNIV_DEBUG */
620 
621  sync_array_wait_event(sync_primary_wait_array, index);
622  goto mutex_loop;
623 
624 finish_timing:
625 #ifdef UNIV_DEBUG
626  if (timed_mutexes == 1 && timer_started==1) {
627  ut_usectime(&sec, &ms);
628  lfinish_time= (ib_int64_t)sec * 1000000 + ms;
629 
630  ltime_diff= (ulint) (lfinish_time - lstart_time);
631  mutex->lspent_time += ltime_diff;
632 
633  if (mutex->lmax_spent_time < ltime_diff) {
634  mutex->lmax_spent_time= ltime_diff;
635  }
636  }
637 #endif /* UNIV_DEBUG */
638  return;
639 }
640 
641 /******************************************************************/
643 UNIV_INTERN
644 void
645 mutex_signal_object(
646 /*================*/
647  mutex_t* mutex)
648 {
649  mutex_set_waiters(mutex, 0);
650 
651  /* The memory order of resetting the waiters field and
652  signaling the object is important. See LEMMA 1 above. */
653  os_event_set(mutex->event);
654  sync_array_object_signalled(sync_primary_wait_array);
655 }
656 
657 #ifdef UNIV_SYNC_DEBUG
658 /******************************************************************/
660 UNIV_INTERN
661 void
662 mutex_set_debug_info(
663 /*=================*/
664  mutex_t* mutex,
665  const char* file_name,
666  ulint line)
667 {
668  ut_ad(mutex);
669  ut_ad(file_name);
670 
671  sync_thread_add_level(mutex, mutex->level);
672 
673  mutex->file_name = file_name;
674  mutex->line = line;
675 }
676 
677 /******************************************************************/
679 UNIV_INTERN
680 void
681 mutex_get_debug_info(
682 /*=================*/
683  mutex_t* mutex,
684  const char** file_name,
685  ulint* line,
686  os_thread_id_t* thread_id)
688 {
689  ut_ad(mutex);
690 
691  *file_name = mutex->file_name;
692  *line = mutex->line;
693  *thread_id = mutex->thread_id;
694 }
695 
696 /******************************************************************/
698 static
699 void
700 mutex_list_print_info(
701 /*==================*/
702  FILE* file)
703 {
704  mutex_t* mutex;
705  const char* file_name;
706  ulint line;
707  os_thread_id_t thread_id;
708  ulint count = 0;
709 
710  fputs("----------\n"
711  "MUTEX INFO\n"
712  "----------\n", file);
713 
714  mutex_enter(&mutex_list_mutex);
715 
716  mutex = UT_LIST_GET_FIRST(mutex_list);
717 
718  while (mutex != NULL) {
719  count++;
720 
721  if (mutex_get_lock_word(mutex) != 0) {
722  mutex_get_debug_info(mutex, &file_name, &line,
723  &thread_id);
724  fprintf(file,
725  "Locked mutex: addr %p thread %ld"
726  " file %s line %ld\n",
727  (void*) mutex, os_thread_pf(thread_id),
728  file_name, line);
729  }
730 
731  mutex = UT_LIST_GET_NEXT(list, mutex);
732  }
733 
734  fprintf(file, "Total number of mutexes %ld\n", count);
735 
736  mutex_exit(&mutex_list_mutex);
737 }
738 
739 /******************************************************************/
742 UNIV_INTERN
743 ulint
744 mutex_n_reserved(void)
745 /*==================*/
746 {
747  mutex_t* mutex;
748  ulint count = 0;
749 
750  mutex_enter(&mutex_list_mutex);
751 
752  mutex = UT_LIST_GET_FIRST(mutex_list);
753 
754  while (mutex != NULL) {
755  if (mutex_get_lock_word(mutex) != 0) {
756 
757  count++;
758  }
759 
760  mutex = UT_LIST_GET_NEXT(list, mutex);
761  }
762 
763  mutex_exit(&mutex_list_mutex);
764 
765  ut_a(count >= 1);
766 
767  return(count - 1); /* Subtract one, because this function itself
768  was holding one mutex (mutex_list_mutex) */
769 }
770 
771 /******************************************************************/
775 UNIV_INTERN
776 ibool
777 sync_all_freed(void)
778 /*================*/
779 {
780  return(mutex_n_reserved() + rw_lock_n_locked() == 0);
781 }
782 
783 /******************************************************************/
786 static
788 sync_thread_level_arrays_get_nth(
789 /*=============================*/
790  ulint n)
791 {
792  ut_ad(n < OS_THREAD_MAX_N);
793 
794  return(sync_thread_level_arrays + n);
795 }
796 
797 /******************************************************************/
800 static
802 sync_thread_level_arrays_find_slot(void)
803 /*====================================*/
804 
805 {
806  sync_thread_t* slot;
807  os_thread_id_t id;
808  ulint i;
809 
810  id = os_thread_get_curr_id();
811 
812  for (i = 0; i < OS_THREAD_MAX_N; i++) {
813 
814  slot = sync_thread_level_arrays_get_nth(i);
815 
816  if (slot->levels && os_thread_eq(slot->id, id)) {
817 
818  return(slot);
819  }
820  }
821 
822  return(NULL);
823 }
824 
825 /******************************************************************/
828 static
830 sync_thread_level_arrays_find_free(void)
831 /*====================================*/
832 
833 {
834  sync_thread_t* slot;
835  ulint i;
836 
837  for (i = 0; i < OS_THREAD_MAX_N; i++) {
838 
839  slot = sync_thread_level_arrays_get_nth(i);
840 
841  if (slot->levels == NULL) {
842 
843  return(slot);
844  }
845  }
846 
847  return(NULL);
848 }
849 
850 /******************************************************************/
853 static
855 sync_thread_levels_get_nth(
856 /*=======================*/
857  sync_priority_t* arr,
859  ulint n)
860 {
861  ut_ad(n < SYNC_THREAD_N_LEVELS);
862 
863  return(arr + n);
864 }
865 
866 /******************************************************************/
870 static
871 ibool
872 sync_thread_levels_g(
873 /*=================*/
874  sync_priority_t* arr,
876  ulint limit,
877  ulint warn)
878 {
879  sync_priority_t* slot;
880  rw_lock_t* lock;
881  mutex_t* mutex;
882  ulint i;
883 
884  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
885 
886  slot = sync_thread_levels_get_nth(arr, i);
887 
888  if (slot->latch != NULL) {
889  if (slot->level <= limit) {
890 
891  if (!warn) {
892 
893  return(FALSE);
894  }
895 
896  lock = slot->latch;
897  mutex = slot->latch;
898 
899  fprintf(stderr,
900  "InnoDB: sync levels should be"
901  " > %lu but a level is %lu\n",
902  (ulong) limit, (ulong) slot->level);
903 
904  if (mutex->magic_n == MUTEX_MAGIC_N) {
905  fprintf(stderr,
906  "Mutex created at %s %lu\n",
907  mutex->cfile_name,
908  (ulong) mutex->cline);
909 
910  if (mutex_get_lock_word(mutex) != 0) {
911  const char* file_name;
912  ulint line;
913  os_thread_id_t thread_id;
914 
915  mutex_get_debug_info(
916  mutex, &file_name,
917  &line, &thread_id);
918 
919  fprintf(stderr,
920  "InnoDB: Locked mutex:"
921  " addr %p thread %ld"
922  " file %s line %ld\n",
923  (void*) mutex,
924  os_thread_pf(
925  thread_id),
926  file_name,
927  (ulong) line);
928  } else {
929  fputs("Not locked\n", stderr);
930  }
931  } else {
932  rw_lock_print(lock);
933  }
934 
935  return(FALSE);
936  }
937  }
938  }
939 
940  return(TRUE);
941 }
942 
943 /******************************************************************/
946 static
947 ibool
948 sync_thread_levels_contain(
949 /*=======================*/
950  sync_priority_t* arr,
952  ulint level)
953 {
954  sync_priority_t* slot;
955  ulint i;
956 
957  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
958 
959  slot = sync_thread_levels_get_nth(arr, i);
960 
961  if (slot->latch != NULL) {
962  if (slot->level == level) {
963 
964  return(TRUE);
965  }
966  }
967  }
968 
969  return(FALSE);
970 }
971 
972 /******************************************************************/
976 UNIV_INTERN
977 void*
978 sync_thread_levels_contains(
979 /*========================*/
980  ulint level)
982 {
983  sync_priority_t* arr;
984  sync_thread_t* thread_slot;
985  sync_priority_t* slot;
986  ulint i;
987 
988  if (!sync_order_checks_on) {
989 
990  return(NULL);
991  }
992 
993  mutex_enter(&sync_thread_mutex);
994 
995  thread_slot = sync_thread_level_arrays_find_slot();
996 
997  if (thread_slot == NULL) {
998 
999  mutex_exit(&sync_thread_mutex);
1000 
1001  return(NULL);
1002  }
1003 
1004  arr = thread_slot->levels;
1005 
1006  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
1007 
1008  slot = sync_thread_levels_get_nth(arr, i);
1009 
1010  if (slot->latch != NULL && slot->level == level) {
1011 
1012  mutex_exit(&sync_thread_mutex);
1013  return(slot->latch);
1014  }
1015  }
1016 
1017  mutex_exit(&sync_thread_mutex);
1018 
1019  return(NULL);
1020 }
1021 
1022 /******************************************************************/
1025 UNIV_INTERN
1026 void*
1027 sync_thread_levels_nonempty_gen(
1028 /*============================*/
1029  ibool dict_mutex_allowed)
1033 {
1034  sync_priority_t* arr;
1035  sync_thread_t* thread_slot;
1036  sync_priority_t* slot;
1037  ulint i;
1038 
1039  if (!sync_order_checks_on) {
1040 
1041  return(NULL);
1042  }
1043 
1044  mutex_enter(&sync_thread_mutex);
1045 
1046  thread_slot = sync_thread_level_arrays_find_slot();
1047 
1048  if (thread_slot == NULL) {
1049 
1050  mutex_exit(&sync_thread_mutex);
1051 
1052  return(NULL);
1053  }
1054 
1055  arr = thread_slot->levels;
1056 
1057  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
1058 
1059  slot = sync_thread_levels_get_nth(arr, i);
1060 
1061  if (slot->latch != NULL
1062  && (!dict_mutex_allowed
1063  || (slot->level != SYNC_DICT
1064  && slot->level != SYNC_DICT_OPERATION))) {
1065 
1066  mutex_exit(&sync_thread_mutex);
1067  ut_error;
1068 
1069  return(slot->latch);
1070  }
1071  }
1072 
1073  mutex_exit(&sync_thread_mutex);
1074 
1075  return(NULL);
1076 }
1077 
1078 /******************************************************************/
1081 UNIV_INTERN
1082 ibool
1083 sync_thread_levels_empty(void)
1084 /*==========================*/
1085 {
1086  return(sync_thread_levels_empty_gen(FALSE));
1087 }
1088 
1089 /******************************************************************/
1093 UNIV_INTERN
1094 void
1095 sync_thread_add_level(
1096 /*==================*/
1097  void* latch,
1098  ulint level)
1100 {
1101  sync_priority_t* array;
1102  sync_priority_t* slot;
1103  sync_thread_t* thread_slot;
1104  ulint i;
1105 
1106  if (!sync_order_checks_on) {
1107 
1108  return;
1109  }
1110 
1111  if ((latch == (void*)&sync_thread_mutex)
1112  || (latch == (void*)&mutex_list_mutex)
1113  || (latch == (void*)&rw_lock_debug_mutex)
1114  || (latch == (void*)&rw_lock_list_mutex)) {
1115 
1116  return;
1117  }
1118 
1119  if (level == SYNC_LEVEL_VARYING) {
1120 
1121  return;
1122  }
1123 
1124  mutex_enter(&sync_thread_mutex);
1125 
1126  thread_slot = sync_thread_level_arrays_find_slot();
1127 
1128  if (thread_slot == NULL) {
1129  /* We have to allocate the level array for a new thread */
1130  array = ut_malloc(sizeof(sync_priority_t) * SYNC_THREAD_N_LEVELS);
1131 
1132  thread_slot = sync_thread_level_arrays_find_free();
1133 
1134  thread_slot->id = os_thread_get_curr_id();
1135  thread_slot->levels = array;
1136 
1137  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
1138 
1139  slot = sync_thread_levels_get_nth(array, i);
1140 
1141  slot->latch = NULL;
1142  }
1143  }
1144 
1145  array = thread_slot->levels;
1146 
1147  /* NOTE that there is a problem with _NODE and _LEAF levels: if the
1148  B-tree height changes, then a leaf can change to an internal node
1149  or the other way around. We do not know at present if this can cause
1150  unnecessary assertion failures below. */
1151 
1152  switch (level) {
1153  case SYNC_NO_ORDER_CHECK:
1154  case SYNC_EXTERN_STORAGE:
1155  case SYNC_TREE_NODE_FROM_HASH:
1156  /* Do no order checking */
1157  break;
1158  case SYNC_TRX_SYS_HEADER:
1159  if (srv_is_being_started) {
1160  /* This is violated during trx_sys_create_rsegs()
1161  when creating additional rollback segments when
1162  upgrading in innobase_start_or_create_for_mysql(). */
1163  break;
1164  }
1165  case SYNC_MEM_POOL:
1166  case SYNC_MEM_HASH:
1167  case SYNC_RECV:
1168  case SYNC_WORK_QUEUE:
1169  case SYNC_LOG:
1170  case SYNC_LOG_FLUSH_ORDER:
1171  case SYNC_THR_LOCAL:
1172  case SYNC_ANY_LATCH:
1173  case SYNC_FILE_FORMAT_TAG:
1174  case SYNC_DOUBLEWRITE:
1175  case SYNC_SEARCH_SYS:
1176  case SYNC_SEARCH_SYS_CONF:
1177  case SYNC_TRX_LOCK_HEAP:
1178  case SYNC_KERNEL:
1179  case SYNC_IBUF_BITMAP_MUTEX:
1180  case SYNC_RSEG:
1181  case SYNC_TRX_UNDO:
1182  case SYNC_PURGE_LATCH:
1183  case SYNC_PURGE_SYS:
1184  case SYNC_DICT_AUTOINC_MUTEX:
1185  case SYNC_DICT_OPERATION:
1186  case SYNC_DICT_HEADER:
1187  case SYNC_TRX_I_S_RWLOCK:
1188  case SYNC_TRX_I_S_LAST_READ:
1189  if (!sync_thread_levels_g(array, level, TRUE)) {
1190  fprintf(stderr,
1191  "InnoDB: sync_thread_levels_g(array, %lu)"
1192  " does not hold!\n", level);
1193  ut_error;
1194  }
1195  break;
1196  case SYNC_BUF_FLUSH_LIST:
1197  case SYNC_BUF_POOL:
1198  /* We can have multiple mutexes of this type therefore we
1199  can only check whether the greater than condition holds. */
1200  if (!sync_thread_levels_g(array, level-1, TRUE)) {
1201  fprintf(stderr,
1202  "InnoDB: sync_thread_levels_g(array, %lu)"
1203  " does not hold!\n", level-1);
1204  ut_error;
1205  }
1206  break;
1207 
1208  case SYNC_BUF_BLOCK:
1209  /* Either the thread must own the buffer pool mutex
1210  (buf_pool->mutex), or it is allowed to latch only ONE
1211  buffer block (block->mutex or buf_pool->zip_mutex). */
1212  if (!sync_thread_levels_g(array, level, FALSE)) {
1213  ut_a(sync_thread_levels_g(array, level - 1, TRUE));
1214  ut_a(sync_thread_levels_contain(array, SYNC_BUF_POOL));
1215  }
1216  break;
1217  case SYNC_REC_LOCK:
1218  if (sync_thread_levels_contain(array, SYNC_KERNEL)) {
1219  ut_a(sync_thread_levels_g(array, SYNC_REC_LOCK - 1,
1220  TRUE));
1221  } else {
1222  ut_a(sync_thread_levels_g(array, SYNC_REC_LOCK, TRUE));
1223  }
1224  break;
1225  case SYNC_IBUF_BITMAP:
1226  /* Either the thread must own the master mutex to all
1227  the bitmap pages, or it is allowed to latch only ONE
1228  bitmap page. */
1229  if (sync_thread_levels_contain(array,
1230  SYNC_IBUF_BITMAP_MUTEX)) {
1231  ut_a(sync_thread_levels_g(array, SYNC_IBUF_BITMAP - 1,
1232  TRUE));
1233  } else {
1234  /* This is violated during trx_sys_create_rsegs()
1235  when creating additional rollback segments when
1236  upgrading in innobase_start_or_create_for_mysql(). */
1237  ut_a(srv_is_being_started
1238  || sync_thread_levels_g(array, SYNC_IBUF_BITMAP,
1239  TRUE));
1240  }
1241  break;
1242  case SYNC_FSP_PAGE:
1243  ut_a(sync_thread_levels_contain(array, SYNC_FSP));
1244  break;
1245  case SYNC_FSP:
1246  ut_a(sync_thread_levels_contain(array, SYNC_FSP)
1247  || sync_thread_levels_g(array, SYNC_FSP, TRUE));
1248  break;
1249  case SYNC_TRX_UNDO_PAGE:
1250  ut_a(sync_thread_levels_contain(array, SYNC_TRX_UNDO)
1251  || sync_thread_levels_contain(array, SYNC_RSEG)
1252  || sync_thread_levels_contain(array, SYNC_PURGE_SYS)
1253  || sync_thread_levels_g(array, SYNC_TRX_UNDO_PAGE, TRUE));
1254  break;
1255  case SYNC_RSEG_HEADER:
1256  ut_a(sync_thread_levels_contain(array, SYNC_RSEG));
1257  break;
1258  case SYNC_RSEG_HEADER_NEW:
1259  ut_a(sync_thread_levels_contain(array, SYNC_KERNEL)
1260  && sync_thread_levels_contain(array, SYNC_FSP_PAGE));
1261  break;
1262  case SYNC_TREE_NODE:
1263  ut_a(sync_thread_levels_contain(array, SYNC_INDEX_TREE)
1264  || sync_thread_levels_contain(array, SYNC_DICT_OPERATION)
1265  || sync_thread_levels_g(array, SYNC_TREE_NODE - 1, TRUE));
1266  break;
1267  case SYNC_TREE_NODE_NEW:
1268  ut_a(sync_thread_levels_contain(array, SYNC_FSP_PAGE)
1269  || sync_thread_levels_contain(array, SYNC_IBUF_MUTEX));
1270  break;
1271  case SYNC_INDEX_TREE:
1272  if (sync_thread_levels_contain(array, SYNC_IBUF_MUTEX)
1273  && sync_thread_levels_contain(array, SYNC_FSP)) {
1274  ut_a(sync_thread_levels_g(array, SYNC_FSP_PAGE - 1,
1275  TRUE));
1276  } else {
1277  ut_a(sync_thread_levels_g(array, SYNC_TREE_NODE - 1,
1278  TRUE));
1279  }
1280  break;
1281  case SYNC_IBUF_MUTEX:
1282  ut_a(sync_thread_levels_g(array, SYNC_FSP_PAGE - 1, TRUE));
1283  break;
1284  case SYNC_IBUF_PESS_INSERT_MUTEX:
1285  ut_a(sync_thread_levels_g(array, SYNC_FSP - 1, TRUE));
1286  ut_a(!sync_thread_levels_contain(array, SYNC_IBUF_MUTEX));
1287  break;
1288  case SYNC_IBUF_HEADER:
1289  ut_a(sync_thread_levels_g(array, SYNC_FSP - 1, TRUE));
1290  ut_a(!sync_thread_levels_contain(array, SYNC_IBUF_MUTEX));
1291  ut_a(!sync_thread_levels_contain(array,
1292  SYNC_IBUF_PESS_INSERT_MUTEX));
1293  break;
1294  case SYNC_DICT:
1295 #ifdef UNIV_DEBUG
1296  ut_a(buf_debug_prints
1297  || sync_thread_levels_g(array, SYNC_DICT, TRUE));
1298 #else /* UNIV_DEBUG */
1299  ut_a(sync_thread_levels_g(array, SYNC_DICT, TRUE));
1300 #endif /* UNIV_DEBUG */
1301  break;
1302  default:
1303  ut_error;
1304  }
1305 
1306  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
1307 
1308  slot = sync_thread_levels_get_nth(array, i);
1309 
1310  if (slot->latch == NULL) {
1311  slot->latch = latch;
1312  slot->level = level;
1313 
1314  break;
1315  }
1316  }
1317 
1318  ut_a(i < SYNC_THREAD_N_LEVELS);
1319 
1320  mutex_exit(&sync_thread_mutex);
1321 }
1322 
1323 /******************************************************************/
1328 UNIV_INTERN
1329 ibool
1330 sync_thread_reset_level(
1331 /*====================*/
1332  void* latch)
1333 {
1334  sync_priority_t* array;
1335  sync_priority_t* slot;
1336  sync_thread_t* thread_slot;
1337  ulint i;
1338 
1339  if (!sync_order_checks_on) {
1340 
1341  return(FALSE);
1342  }
1343 
1344  if ((latch == (void*)&sync_thread_mutex)
1345  || (latch == (void*)&mutex_list_mutex)
1346  || (latch == (void*)&rw_lock_debug_mutex)
1347  || (latch == (void*)&rw_lock_list_mutex)) {
1348 
1349  return(FALSE);
1350  }
1351 
1352  mutex_enter(&sync_thread_mutex);
1353 
1354  thread_slot = sync_thread_level_arrays_find_slot();
1355 
1356  if (thread_slot == NULL) {
1357 
1358  ut_error;
1359 
1360  mutex_exit(&sync_thread_mutex);
1361  return(FALSE);
1362  }
1363 
1364  array = thread_slot->levels;
1365 
1366  for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {
1367 
1368  slot = sync_thread_levels_get_nth(array, i);
1369 
1370  if (slot->latch == latch) {
1371  slot->latch = NULL;
1372 
1373  mutex_exit(&sync_thread_mutex);
1374 
1375  return(TRUE);
1376  }
1377  }
1378 
1379  if (((mutex_t*) latch)->magic_n != MUTEX_MAGIC_N) {
1380  rw_lock_t* rw_lock;
1381 
1382  rw_lock = (rw_lock_t*) latch;
1383 
1384  if (rw_lock->level == SYNC_LEVEL_VARYING) {
1385  mutex_exit(&sync_thread_mutex);
1386 
1387  return(TRUE);
1388  }
1389  }
1390 
1391  ut_error;
1392 
1393  mutex_exit(&sync_thread_mutex);
1394 
1395  return(FALSE);
1396 }
1397 #endif /* UNIV_SYNC_DEBUG */
1398 
1399 /******************************************************************/
1401 UNIV_INTERN
1402 void
1404 /*===========*/
1405 {
1406 #ifdef UNIV_SYNC_DEBUG
1407  sync_thread_t* thread_slot;
1408  ulint i;
1409 #endif /* UNIV_SYNC_DEBUG */
1410 
1411  ut_a(sync_initialized == FALSE);
1412 
1413  sync_initialized = TRUE;
1414 
1415  /* Create the primary system wait array which is protected by an OS
1416  mutex */
1417 
1418  sync_primary_wait_array = sync_array_create(OS_THREAD_MAX_N,
1420 #ifdef UNIV_SYNC_DEBUG
1421  /* Create the thread latch level array where the latch levels
1422  are stored for each OS thread */
1423 
1424  sync_thread_level_arrays = ut_malloc(OS_THREAD_MAX_N
1425  * sizeof(sync_thread_t));
1426  for (i = 0; i < OS_THREAD_MAX_N; i++) {
1427 
1428  thread_slot = sync_thread_level_arrays_get_nth(i);
1429  thread_slot->levels = NULL;
1430  }
1431 #endif /* UNIV_SYNC_DEBUG */
1432  /* Init the mutex list and create the mutex to protect it. */
1433 
1435  mutex_create(mutex_list_mutex_key, &mutex_list_mutex,
1436  SYNC_NO_ORDER_CHECK);
1437 #ifdef UNIV_SYNC_DEBUG
1438  mutex_create(sync_thread_mutex_key, &sync_thread_mutex,
1439  SYNC_NO_ORDER_CHECK);
1440 #endif /* UNIV_SYNC_DEBUG */
1441 
1442  /* Init the rw-lock list and create the mutex to protect it. */
1443 
1444  UT_LIST_INIT(rw_lock_list);
1445  mutex_create(rw_lock_list_mutex_key, &rw_lock_list_mutex,
1446  SYNC_NO_ORDER_CHECK);
1447 
1448 #ifdef UNIV_SYNC_DEBUG
1449  mutex_create(rw_lock_debug_mutex_key, &rw_lock_debug_mutex,
1450  SYNC_NO_ORDER_CHECK);
1451 
1452  rw_lock_debug_event = os_event_create(NULL);
1453  rw_lock_debug_waiters = FALSE;
1454 #endif /* UNIV_SYNC_DEBUG */
1455 }
1456 
1457 /******************************************************************/
1460 UNIV_INTERN
1461 void
1463 /*===========*/
1464 {
1465  mutex_t* mutex;
1466 
1468 
1469  mutex = UT_LIST_GET_FIRST(mutex_list);
1470 
1471  while (mutex) {
1472 #ifdef UNIV_MEM_DEBUG
1473  if (mutex == &mem_hash_mutex) {
1474  mutex = UT_LIST_GET_NEXT(list, mutex);
1475  continue;
1476  }
1477 #endif /* UNIV_MEM_DEBUG */
1478  mutex_free(mutex);
1479  mutex = UT_LIST_GET_FIRST(mutex_list);
1480  }
1481 
1482  mutex_free(&mutex_list_mutex);
1483 #ifdef UNIV_SYNC_DEBUG
1484  mutex_free(&sync_thread_mutex);
1485 
1486  /* Switch latching order checks on in sync0sync.c */
1487  sync_order_checks_on = FALSE;
1488 #endif /* UNIV_SYNC_DEBUG */
1489 
1490  sync_initialized = FALSE;
1491 }
1492 
1493 /*******************************************************************/
1495 UNIV_INTERN
1496 void
1498 /*=================*/
1499  FILE* file)
1500 {
1501 #ifdef UNIV_SYNC_DEBUG
1502  fprintf(file, "Mutex exits %llu, rws exits %llu, rwx exits %llu\n",
1504 #endif
1505 
1506  fprintf(file,
1507  "Mutex spin waits %"PRId64", rounds %"PRId64", "
1508  "OS waits %"PRId64"\n"
1509  "RW-shared spins %"PRId64", rounds %"PRId64", OS waits %"PRId64";"
1510  " RW-excl spins %"PRId64", rounds %"PRId64", OS waits %"PRId64"\n",
1511  mutex_spin_wait_count,
1512  mutex_spin_round_count,
1513  mutex_os_wait_count,
1520 
1521  fprintf(file,
1522  "Spin rounds per wait: %.2f mutex, %.2f RW-shared, "
1523  "%.2f RW-excl\n",
1524  (double) mutex_spin_round_count /
1525  (mutex_spin_wait_count ? mutex_spin_wait_count : 1),
1526  (double) rw_s_spin_round_count /
1528  (double) rw_x_spin_round_count /
1530 }
1531 
1532 /*******************************************************************/
1534 UNIV_INTERN
1535 void
1537 /*=======*/
1538  FILE* file)
1539 {
1540 #ifdef UNIV_SYNC_DEBUG
1541  mutex_list_print_info(file);
1542 
1543  rw_lock_list_print_info(file);
1544 #endif /* UNIV_SYNC_DEBUG */
1545 
1547 
1548  sync_print_wait_info(file);
1549 }