ColorScape: A Creative Arti cial Ecosystem Model of Communication and Collective Creativity in Global Participatory Science by Guangyu Zou A dissertation submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 7, 2012 Keywords: Global Participatory Science, Innovation, Agent-based Simulation, ColorScape Copyright 2012 by Guangyu Zou Approved by: Levent Yilmaz, Chair, Associate Professor of Computer Science and Software Engineering Drew Hamilton, Professor of Computer Science and Software Engineering Wei Shinn Ku, Assistant Professor of Computer Science and Software Engineering Abstract With the increasing use of cyberinfrastructure and popularity of e-Science initiatives, science is becoming truly globalized, reducing barriers to entry and enabling formation of open and global networked innovation communities. Yet, relatively little is known about the mechanisms that govern such globalized communities. Meanwhile, creative arti cial ecosys- tem metaphors and interaction processes among communities have potential to shed light on the e ects of communication styles in the emergence of global knowledge communities. So, this study explores how networks of scienti c communities and epistemic cultures form and evolve, what network patterns emerge from di erent socio-technical communication theories, and the relationship between environmental constraints, community traits, and innovation performance and potential. Understanding scienti c communities and their associated com- munication networks is key to understanding the dynamics of knowledge creation, as well as formation and growth of scienti c communities to facilitate informed science and innova- tion policy-making. A bene t of this research is to o er federal agencies a computer-aided decision-making tool so as to evaluate investment decision and policies. To this end, an agent-based simulation model combining boundary processes and theories of communication is developed. The model is veri ed and validated with respect to empirical network data. Simulation results suggest that communities with highly connected clusters are likely to thrive if resource availability is low. So far as the resource allocation strategy is concerned, key area investment with technology transferring results in the highest variety. Exploration of the impact of socio-technical communication theories suggest that under low communi- cation frequency, openness and receptivity lead to higher variety. On the contrary, variety decreases with increasing receptivity under high communication frequency. ii Acknowledgments I would like to express special gratitude to my advisor, Dr. Levent Yilmaz, associate professor, Department of Computer Science and Software Engineering at Auburn University, for his instruction, guidance, encouragement and patience in completion of the research and dissertation. In particular, his suggestions, criticisms and materials greatly contributed to this thesis. Thanks also to my advisory committee members, Dr. Drew Hamilton, Dr. Wei Shinn Ku and the professors and sta members in the Department of Computer Science and Software Engineering at Auburn University for their kindness and help through these three years. Especially thanks to the university reader, Prof. Shu-Wen Tzeng, who provides valuable comments to this dissertation. Finally, sincere thanks to my wife Ying Zhao. She gave her greatest support and en- couragement to help me succeed in nishing all the research work. Also, I thank my parents, who poured enormous e ort into supporting my study during these years. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Signi cance of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4.1 Contributions to Theory of Agent-Based Modeling . . . . . . . . . . . 5 1.4.2 Contributions to Science and Innovation Policy Development . . . . . 7 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Characteristics of GPS Communities . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Understanding GPS as a Communication System . . . . . . . . . . . 10 2.1.2 Understanding GPS as a Creative Ecosystem . . . . . . . . . . . . . . 12 2.1.3 Boundary Processes in Knowledge Creation and Di usion . . . . . . . 14 2.1.4 Understanding GPS as a Complex Adaptive System . . . . . . . . . . 15 2.2 Environmental Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Relation to Earlier Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Design Concepts and Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 iv 3.4 Interaction within Community . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 Relationship between Maturity and Resources . . . . . . . . . . . . . 27 3.4.2 Resources Consumed . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Learning between Communities . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5.1 Updating the Intensity of Communities? In uences . . . . . . . . . . . 30 3.5.2 Updating the Maturity of a Community . . . . . . . . . . . . . . . . 32 3.5.3 Updating the Discipline of a Community . . . . . . . . . . . . . . . . 33 3.5.4 Updating the Resource of a Community . . . . . . . . . . . . . . . . 35 3.6 Innovation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.6.1 Reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.6.2 Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.7 Grow and Fade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.8 Heterogeneous Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.8.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4 Implementation of Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 Introduction to Repast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Implementation of Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Visual Snapshots of the Simulation View . . . . . . . . . . . . . . . . . . . . 45 5 Veri cation, Validation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 49 5.1 Veri cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.1 Micro Veri cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1.2 Macro Veri cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.1 Conceptual Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2.2 Micro Operational Validation . . . . . . . . . . . . . . . . . . . . . . 53 5.2.3 Macro Operational Validation . . . . . . . . . . . . . . . . . . . . . . 54 5.2.3.1 Emergence of Communities . . . . . . . . . . . . . . . . . . 55 v 5.2.3.2 Comparison with Institutions around Department of Energy 55 5.3 A Robust Evolutionary Framework for Validation . . . . . . . . . . . . . . . 57 5.3.1 Design of the Validation Framework . . . . . . . . . . . . . . . . . . . 57 5.3.2 Gene Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.3.3 Gene Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.3.4 Population Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3.5 Repair to the Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3.6 The Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.7 Termination Condition . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3.8 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3.9 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3.10 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3.11 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3.12 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Comparison with Overlay Map . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.5 Comparison with the OBO Domain-Domain Data . . . . . . . . . . . . . . . 69 5.6 Power Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6 Simulation Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.1 Interaction Toplogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2 Measuring Innovation Potential and Performance . . . . . . . . . . . . . . . 75 6.2.1 Innovation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.2.1.1 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.2.1.2 Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.2.2 Network Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.3.1 Diversity vs. Carrying Capacity . . . . . . . . . . . . . . . . . . . . . 78 6.3.2 Diversity vs. External Resource . . . . . . . . . . . . . . . . . . . . . 80 vi 6.3.3 Diversity vs. Reorganization . . . . . . . . . . . . . . . . . . . . . . . 81 6.3.4 Diversity vs. Receptivity . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.3.5 Resilience of Di erent Network Topologies . . . . . . . . . . . . . . . 83 6.3.6 Relationship between Diversity and Network Metrics . . . . . . . . . 84 6.3.7 Sustainability, Resource Availability, and Connectedness . . . . . . . 86 6.3.8 Disparity vs. Resource and Connectedness . . . . . . . . . . . . . . . 87 6.4 Experiments on Resource Allocation Strategy . . . . . . . . . . . . . . . . . 89 6.4.1 Design of Resources Allocation Strategies . . . . . . . . . . . . . . . . 90 6.4.1.1 Uniform Allocation . . . . . . . . . . . . . . . . . . . . . . . 90 6.4.1.2 Proportional to Contribution . . . . . . . . . . . . . . . . . 90 6.4.1.3 Proportional to Cluster Size . . . . . . . . . . . . . . . . . . 92 6.4.1.4 Proportional to Importance of Domains . . . . . . . . . . . 94 6.4.1.5 Competitive Allocation . . . . . . . . . . . . . . . . . . . . 96 6.4.1.6 P2P Lending . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4.1.7 Random Allocation . . . . . . . . . . . . . . . . . . . . . . . 97 6.4.2 Network Pattern vs. Resource Allocation Strategy . . . . . . . . . . . 97 6.4.3 Variety vs. Resource Allocation Strategy . . . . . . . . . . . . . . . . 101 7 Comparison of Communication Theories in Terms of Innovation Performance . . 104 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2 Homophily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.3 Structural Hole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.4 Preferential Attachment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.4.1 Preferential Attachment Based on Resources . . . . . . . . . . . . . . 110 vii 7.4.2 Preferential Attachment Based on Links . . . . . . . . . . . . . . . . 111 7.4.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.5 Balance Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.5.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.6 Exchange Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.6.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.6.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.6.2.1 Resource Accessibility . . . . . . . . . . . . . . . . . . . . . 121 7.6.2.2 Law of N-Squared . . . . . . . . . . . . . . . . . . . . . . . 122 7.6.2.3 Iron law of Oligarchy . . . . . . . . . . . . . . . . . . . . . . 123 7.7 Experiments on Communication Theories . . . . . . . . . . . . . . . . . . . . 124 7.7.1 Variety vs. External Resource . . . . . . . . . . . . . . . . . . . . . . 125 7.7.2 Sustainability vs. Resource Availability . . . . . . . . . . . . . . . . . 126 7.7.3 Sustainability vs. Receptivity . . . . . . . . . . . . . . . . . . . . . . 128 7.7.3.1 Variety vs. Receptivity . . . . . . . . . . . . . . . . . . . . . 129 7.7.3.2 Innovation Potential . . . . . . . . . . . . . . . . . . . . . . 130 7.7.3.3 Knowledge Di usion E ciency . . . . . . . . . . . . . . . . 132 7.7.3.4 Network Patterns . . . . . . . . . . . . . . . . . . . . . . . . 132 8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1 Findings and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1.1 ColorScape: A General Purpose Model . . . . . . . . . . . . . . . . . 138 8.1.2 Community?s Traits vs. Diversity . . . . . . . . . . . . . . . . . . . . 138 8.1.3 Environmental Constraints vs. Diversity, Sustainability, and Resilience 139 8.1.4 Network Metrics vs. Variety . . . . . . . . . . . . . . . . . . . . . . . 140 8.1.5 Allocation Strategies vs. Variety . . . . . . . . . . . . . . . . . . . . . 141 viii 8.1.6 Communication Strategies vs. Diversity, Sustainability, and Innova- tion Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.3 Limitation and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 144 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 ix List of Illustrations 3.1 Snapshots of Colorscape Model . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Network of Scienti c Communities . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 The Activity Flow of the ColorScape Model . . . . . . . . . . . . . . . . . . 24 3.4 Triple Helix of University-Industry-Government Relations . . . . . . . . . . . 25 3.5 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Interaction within Communities . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7 Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.8 Flow Chart of the Community Learning Process . . . . . . . . . . . . . . . . 31 3.9 Updating Maturity during the Learning Process . . . . . . . . . . . . . . . . 32 3.10 Domain Update during the Learning Process . . . . . . . . . . . . . . . . . . 34 3.11 Flow Chart of Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.12 Updating the Domain during the Innovation Process . . . . . . . . . . . . . 37 3.13 Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1 Contexts and Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Class Diagram of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Snapshots of 2D Communication Context . . . . . . . . . . . . . . . . . . . . 46 4.4 Snapshots of Scale-free Communication Context . . . . . . . . . . . . . . . . 47 4.5 Snapshots of Dynamic Communication Context . . . . . . . . . . . . . . . . 48 5.1 Overview of Veri cation and Validation [92] . . . . . . . . . . . . . . . . . . 49 5.2 Growth and Formation of Community Clusters . . . . . . . . . . . . . . . . . 55 x 5.3 Emergent Network Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Comparison of Clustering Coe cient . . . . . . . . . . . . . . . . . . . . . . 56 5.5 Comparison of Communities Number and Average Degree . . . . . . . . . . 57 5.6 Validation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.7 Gene Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.8 Gene Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.9 Core/Periphery Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.10 Class Diagram of Validation Framework . . . . . . . . . . . . . . . . . . . . 67 5.11 Sequence Diagram of Validation Framework . . . . . . . . . . . . . . . . . . 68 5.12 Overlay Map [75] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.13 Snapshot of the Colorscape Model against Overlay Map . . . . . . . . . . . . 70 5.14 OBO Domain-Domain Network . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.15 Snapshot of Colorscape Model against OBO . . . . . . . . . . . . . . . . . . 72 5.16 Clusters of the Network of Colorscape Model against OBO . . . . . . . . . . 73 5.17 Distribution of Resources in ColorScape Model . . . . . . . . . . . . . . . . . 73 6.1 The Evaluation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2 Diversity vs. Initial Community Numbers . . . . . . . . . . . . . . . . . . . 79 6.3 Variety vs. Neighbor Size in 1D . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.4 Diversity vs. Resource Allocated Per Time . . . . . . . . . . . . . . . . . . . 80 6.5 Diversity vs. Reorganization Tendency . . . . . . . . . . . . . . . . . . . . . 82 6.6 Variety in Random and Random Group Network . . . . . . . . . . . . . . . . 83 6.7 Number of Active Communities . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.8 Comparison of Random and Random Group Network on Resilience . . . . . 84 6.9 Variety vs. Density in Random and Random Group Network . . . . . . . . . 85 6.10 Variety vs. Centrality in Random and Random Group Network . . . . . . . 85 xi 6.11 Species Diversity vs. Population Density in [40] . . . . . . . . . . . . . . . . 86 6.12 Success Rate vs. Resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.13 Disparity vs. Resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.14 Class Diagram of Resources Allocation . . . . . . . . . . . . . . . . . . . . . 91 6.15 Flow Chart of Uniform Allocation . . . . . . . . . . . . . . . . . . . . . . . . 92 6.16 Flow Chart of Allocation Proportional to Contribution . . . . . . . . . . . . 93 6.17 Flow Chart of Allocation Proportional to Cluster . . . . . . . . . . . . . . . 94 6.18 Flow Chart of Allocation Proportional to Importance of Domains . . . . . . 95 6.19 Flow Chart of Competitive Allocation . . . . . . . . . . . . . . . . . . . . . . 96 6.20 Flow Chart of P2P Lending . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.21 Flow Chart of Random Allocation . . . . . . . . . . . . . . . . . . . . . . . . 99 6.22 Strategy 1 vs. Strategy 2 vs. Strategy 3 . . . . . . . . . . . . . . . . . . . . 99 6.23 Allocation Proportional to Importance of Domains . . . . . . . . . . . . . . 100 6.24 Patterns in Network Con guration Experiment . . . . . . . . . . . . . . . . 100 6.25 Competitive Allocation vs. P2P Lending . . . . . . . . . . . . . . . . . . . . 101 6.26 Variety vs. Allocation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.1 Process of Communication using Homophily Theory . . . . . . . . . . . . . 106 7.2 Communication Frequency vs. Similarity . . . . . . . . . . . . . . . . . . . . 107 7.3 Process of Communication using Structural Hole Theory . . . . . . . . . . . 108 7.4 Resource vs. E ective Network Size under Structural Hole Theory . . . . . . 109 7.5 Communication Process of Preferential Attachment based on Resources . . . 110 7.6 Communication Process of Preferential Attachment based on Links . . . . . 111 7.7 Communities? Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.8 Communities? Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.9 Process of Communication using Balance Theory . . . . . . . . . . . . . . . 115 xii 7.10 In uences Change with Dissimilarity . . . . . . . . . . . . . . . . . . . . . . 117 7.11 Relations under Balance Theory . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.12 P2P Collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.13 Resource Availability along with Closeness Centrality . . . . . . . . . . . . . 122 7.14 Number of Target Communities vs. Population . . . . . . . . . . . . . . . . 123 7.15 Emergent Networks over Time . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.16 Variety vs. External Resources at Moderate Communication Frequency . . . 126 7.17 Variety vs. External Resources at Low Communication Frequency . . . . . . 126 7.18 Sustainability vs. External Resources . . . . . . . . . . . . . . . . . . . . . . 127 7.19 Sustainability vs. Receptivity . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.20 Variety vs. Receptivity under Low Communication Frequency . . . . . . . . 129 7.21 Variety vs. Receptivity under High Communication Frequency . . . . . . . . 130 7.22 Innovation Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.23 Knowledge Di usion E ciency . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.24 Networks Generated under Homophiliy and Exchange Theory . . . . . . . . 134 7.25 The Network Generated under Preference Attachment based on Links Theory 135 7.26 Communities? Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.27 Networks under Balance, Structural Hole, and Preference Attachment based on Resources Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 xiii List of Tables 2.1 Traditional Scienti c Teams vs. Global Participatory Science . . . . . . . . . 10 2.2 Selected Social Theories [61] . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 Initial Values of State Variables . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1 Veri cation and Validation at Micro and Macro Level . . . . . . . . . . . . 50 5.2 Summary of the Integration Test for the Learning Process . . . . . . . . . . 52 5.3 Summary of Conceptual Validation of Each Subprocess . . . . . . . . . . . . 53 5.4 Gene Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.5 The Best Con guration against Overlay Map . . . . . . . . . . . . . . . . . 67 5.6 Simulation Output vs. Overlay Map . . . . . . . . . . . . . . . . . . . . . . 69 5.7 The Best Con guration against OBO Data . . . . . . . . . . . . . . . . . . 70 5.8 Simulation vs. OBO Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1 Resilience of Di erent Network Topologies . . . . . . . . . . . . . . . . . . . 84 6.2 Success Rate and Disparity . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3 Allocation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.1 Illustration of Building Links based on Balance Theory . . . . . . . . . . . . 114 7.2 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 xiv Chapter 1 Introduction 1.1 Problem Creativity is the production of novel and useful ideas by an individual or group of individuals working together [4]. Innovation is extension of creativity, as it is the successful implementation, adoption, and transfer of creative ideas, products, processes, or services [98]. Collective creativity emphasizes the collaboration and coordination of all members in a community rather than individual works. Scienti c communities provide a concrete basis to facilitate scienti c discovery and collective creativity. So, the study of scienti c communities is bene cial to understand the dynamics of knowledge creation, as well as their formation and growth to facilitate informed science and innovation policy-making. A scienti c community consists of scientists, domain knowledge as well as their rela- tionships, and interactions. It is normally divided into \sub-communities" each of which works on a particular eld within science, and objectivity is expected to be achieved by the scienti c method [105]. As the access to and production of knowledge are increasingly becoming transparent, the practice of science is now more open and global [113], where communication is carried by networks, and shared knowledge is documented in electronic medias such as software and electronic documents. The cyber-infrastructure transcends the geographical boundaries so that members around the world can collectively make con- tributions in the virtual scienti c community. Such virtual collaboratories include Open Source Science (OSS) communities such as OBO Foundry (Open Biomedical Ontologies) [86], NanoHUB (Simulation, Education, Technology for Nano Technology) [62], and NEES Grid (Network for Earthquake Engineering Cyberinfrastructure) [63]. It leads to an evolving collective knowledge-base that is governed and maintained by community members without 1 central authority. We call such communities of practice and epistemic cultures as Global Participatory Science (GPS) communities [108]. GPS is based on a self-organizing network in which scientists work together not because they are asked to but because they desire it [93]. Social scientists have proposed several theories of communication to interpret the underlying mechanisms of forming such self- organized networks [61]. Therefore, it is important to explore the e ects of di erent theories of communication on patterns of emerging networks and innovation performances. Based on these observations, we focus on the following problems: 1. How do scienti c communities? networks form and evolve, and what network patterns emerge from di erent socio-technical communication theories such as Cognitive Theo- ries, Self-interest Theories, Exchange and Dependency, Homophiliy & Proximity, and Preferential Attachment [61]? 2. How do scienti c communities respond to environmental changes such as funding and resource allocation across research areas? Since communities sustain themselves by adapting to changing environmental conditions, while shaping their cognitive niches, how can we design innovation environments that in uence overall innovation potential and performance of the landscape of scienti c communities? 3. What is the impact of scienti c community traits (i.e., receptivity, exibility, reorgani- zation tendency) and environmental constraints (i.e., interaction topologies, maximum community number, level of external funding) on the innovation performance (e.g., diversity and resilience) of GPS? 4. Which metrics measure innovation performance and potential based on science of net- works and complex adaptive systems perspective? What are the underlying inter- relationships between communities? con guration parameters, network metrics, and diversity, resilience, as well as innovation? 2 1.2 Signi cance of the Problem The globalization driven by advances in computer and communication technology, as well as the collective economic and political processes brings dramatic changes in organi- zational forms and communication networks [61]. The key for such dramatic changes of organizational landscape is the emergence of social communication networks among organi- zations. Furthermore, the underlying mechanisms for such social communication networks can be abstracted into several theories of communication. Therefore, it is important to study these theories that shape the communication networks. Understanding scienti c communities and their associated social communication net- works is key to understanding the dynamics of knowledge creation, as well as formation and growth of scienti c communities to facilitate informed science and innovation policy-making. Some Federal agencies, such as NIH and DOE, have begun to use social network analysis techniques to understand the process of innovation [91]. Lack of knowledge of science and innovation dynamics can lead to serious and unintended consequences [91]. For example, Federal encouragement of universities to transfer technologies to industry has resulted in universities putting more attention on near-term research rather than long-term basic re- search. In addition, Shane [83] examines the e ects of Bayh-Dole Act in the United States on one aspect of technology commercialization i.e. university patenting, and suggests that the Bayh-Dole Act provided incentives for universities to increase patenting. Studying the formation and behavior of scienti c communities could avoid unnecessary duplication by predicating what nal forms the community could evolve into. For instance, in [47], Kaiser presents a computational model to predict the emergence and development of scienti c elds. \Although the importance of investment in science, technology is understood, the ra- tionale for speci c scienti c investment decisions lacks a strong theoretical and empirical basis" [91]. So, an interdisciplinary research theme, called \the science of science policy" has recently emerged. This is a theme that aims to provide a scienti cally rigorous quantitative 3 basis from which policy makers and researchers could assess the impacts and likely outputs, while improving the understanding of its dynamics [28]. It is critically important to develop science of science policy because the U.S. Federal government?s total R&D budget reached $139 billion in 2007, and it is essential to make use of such a signi cant amount of funding e ectively so as to maximize social and economic bene ts. Research funding could be structured to encourage the formation of new communities, as is currently occurring through the large Federal investment in the nanoscience [82] and synthetic biology [10] communities. Investment in innovation capacity is the key to higher productivity, higher wages, and higher economic growth [91]. Although more emphasis have been put on investment analysis, there is little understanding of how scienti c communities respond to changes in funding within research areas. The understanding of how communities of science evolve would have clear implication for investment decisions. 1.3 Strategy Under globalization driven by advances in computer and communication technology, the ow of information that transmits through communication networks is independent of space and time, because people can share knowledge and make contributions simultaneously anywhere in the world [61]. Furthermore, the mechanisms for the emergence and evolution of communication networks can be abstracted into several communication theories. So, the rst perspective for GPS is a global communication system. In the communication network, communities a ect and are in uenced by peer commu- nities through boundary processes, during which cooperation and competition occur. GPS communities operate in ways similar to ecosystems in that communities act to ensure their survival and success by accessing resources, creating knowledge, and keeping attractiveness within the social communication network in which they want to thrive [60]. So, the second perspective for GPS is a creative ecosystem. 4 As a communication system, the interconnections between communities are emphasized. As an ecosystem, it focuses on cooperation and competition driven by which communities form, develop, fade, and coevolve. These two properties can be combined under the frame- work of a complex adaptive system, because the complex system is a system composed of interconnected parts that as a whole exhibit one or more properties not obvious from the properties of the individual parts [99]. Meanwhile, agent based modeling (ABM) is an ideal way to study complex systems because even a simple ABM can exhibit complex behavior patterns and provide valuable information about the dynamics of the real-world system that it emulates [12]. ABMs provide theoretical leverage where global patterns of interest are more than the aggregation of individual attributes, but at the same time, the emergent pat- tern cannot be understood without a bottom up dynamic model of the micro foundations at the relational level [55]. The strategy adopted here is to explore how communities? innovation networks form and evolve under a speci c communication theory using the complex adaptive systems per- spective. Furthermore, the environment is designed to maximize innovation outputs based on the understanding of communities? responses to varying investment strategies. So the study is guided by network theory, boundary processes, and the theory of complex adaptive systems. 1.4 Contributions 1.4.1 Contributions to Theory of Agent-Based Modeling Agent-based Modeling provides theoretical leverage to explore complex systems where global patterns result from interactions of multiple agents. In a large-scale ABM, it is es- sential to standardize the communication among interacted agents, since agents may be developed by di erent programmers. Agent Communication Language (ACL) [31] is pro- posed by the Foundation for Intelligent Physical Agents (FIPA) as a standard language for agent communications. ACL mainly focuses on the structure of message sent and received 5 by agents, which includes four mandatory parameters: performative, sender, receiver, and content [31]. Because FIPA-ACL is a basic speech-act theory based communication primi- tive, it does not provide any speci c rules to guide how the communication is carried out. My research using social communication theories as behavioral rules of agents can advance existing ACL by providing a new layer above it. The new layer is named as communica- tion protocol that de nes how to choose communication targets, when the communication happens, and when the communication dissolves. Agent-Based Models (ABMs) are often criticized for relying on informal, subjective, and qualitative validation procedures [27]. Because most ABMs are highly abstract and are built from bottom up, their emergent behavior is often unpredictable. Furthermore, ABMs are often developed for studying complex adaptive phenomena, which involve uncertainty and ambiguity in terms of their underlying behavioral mechanisms. Models that focus on human and social dynamics are especially prone to ambiguity and uncertainty. To gain empirical insight into such problems and to be able to generate behavior that mimics expected or theoretical scenarios, model development and re nement should be coupled with evaluation and validation. The validation strategy used here is a Robust Generative Validation (RGV) [111] method that refers to the strategies used by scientists in generating and validating knowledge. The main steps of RGV consist of generating ensembles, initiating schema, eval- uating schema, and transforming schema, where each ensemble refers to a single hypotheses space and each schema refers to the set of con gurations corresponding to the ensemble. The model introduced in this dissertation is validated using a simpli ed RGV by replacing a network of ensembles with two independent experiments. These two experiments aim to compare simulation outputs against overlay map and OBO Foundry respectively. 6 1.4.2 Contributions to Science and Innovation Policy Development In order to address the questions raised in section 1.1, a multi-agent model is built, in which behavioral rules of agents can be varied based on various social communication theories including homophily, preferential attachment, structural hole, exchange, and balance theory. Homophily theory has been identi ed by social scientists as an important mechanism that explains communication networks are created, maintained, and reconstituted [61]. Ho- mophiliy means a community would like to communicate with similar others and is highly in uenced by similar peer communities. Similarity is thought to ease communication, fos- ter trust, and reciprocity, and improve di usion of knowledge [15]. Structural holes are those knowledge spaces where communities are not connected so that other communities may exploit them by investing their social capital to indirectly link two or more uncon- nected communities [61]. The community that lls the structural hole becomes a broker in relationship to others. A preferential attachment is a process where resource is distributed among individuals according to how much they already have, i.e., rich gets richer. Under suitable circumstance, preferential attachment can generate power law [25] that exists in many social systems, for instance, the number of papers published by authors, the citation index of papers etc. Exchange and dependency theories seek to address how communication emerges based on the distribution of information and resources across the members. Heider?s balance theory [38] states: \my friend?s friend is my friend; my friend?s enemy is my enemy; my enemy?s friend is my enemy; my enemy?s enemy is my friend", which means friends have similar attitudes, while enemies have di erent opinions on the third object. Using homophily, preferential attachment, structural hole, exchange, and balance theory, we analyze the inter- action between communities and compare them in terms of emergent network patterns and innovation metrics. In addition, experiments using agent-based simulation have been conducted. Among these experiments, we examine six types of topologies (i.e., 1D grid, 2D grid, random net- work, random group network, scale-free network, and dynamic network) and observe the 7 emergent patterns of communities and their interrelationship with innovation performance. Simulation results show that scale-free network has the highest resilience compared with ran- dom and random group network. In addition, the relationship between variety and density is a concave-like function, to which the relationship between variety and centrality is similar. Furthermore, policy-makers may encourage communities to build highly connected clusters if resource availability is low. As far as the resource allocation strategy is concerned, key area investment with technology transferring results in the highest variety. Considering the situation where communities communicate with one another guided by structural hole, pref- erential attachment, or homophily theory, decision-makers may encourage communities to be open to accept in uences from peers in order to foster innovation. In addition, under low communication frequency, openness and receptivity lead to higher variety. On the contrary, variety decreases with increasing receptivity under high communication frequency. The rest of the dissertation is organized as follows. In Chapter 2, we present background on GPS from the perspective of communication system, ecosystem, and complex adaptive system. Chapter 3 introduces the design and formalization of the model, which embodies the mechanisms of boundary processes, Homophily theory, and HSB (Hue, Saturation and Brightness) color model that is used to visualize emergent community landscapes. Chapter 4 describes the implementation of the model. The veri cation and validation are conducted in Chapter 5, where a novel generative validation method is devised to instill con dence in the operational behavior of the model. Chapter 6 delineates metrics and indicators used to measure network structure and innovation output, as well as evaluation using these metrics. Chapter 7 examines the impacts of socio-technical communication theories on innovation performance. Finally, in Chapter 8, we conclude by summarizing our ndings and point out potential avenues of future research. 8 Chapter 2 Literature Review The research conducted in the dissertation views global participatory science as a global communication system and a creative ecosystem. These two perspectives can be combined under the framework of a complex adaptive system. 2.1 Characteristics of GPS Communities Recently, a number of virtual scienti c collaboratories emerged and continue to success- fully bring together scientists over the globe to collaborate to not only share and aggregate data, but also create new knowledge [93]. Such virtual collaboratories include Open Source Science (OSS) communities such as OBO Foundry (Open Biomedical Ontologies) [86], which is a form of GPS. So, we choose OSS communities as a research object to study, develop, and explore models of innovation in collective knowledge creation communities. OSS com- munities are immune to process loss through production blocking because all team members can contribute ideas simultaneously. In addition, OSS communities reduce cognitive fail- ures and enhance the synergistic e ects of group brainstorming using electronic media to communicate, because access to the data is unrestricted by individual recall [108]. Further- more, compared with traditional scienti c teams, OSS is located with distributed structure of network, as well as more open and transparent due to decentralized decision-making style. Besides OBO, the following are among such open science communities: NanoHUB (Simulation Education Technology for Nano Technology) [62], and NEES Grid (Network for Earthquake Engineering Cyberinfrastructure) [63]. Table 2.1 describes the comparison between traditional scienti c teams and open source communities. 9 Table 2.1: Traditional Scienti c Teams vs. Global Participatory Science Criteria Additional Criteria Traditional Scien- ti c Teams Open Source Com- munities Distribution Space Co-located DistributedTime Synchronous Asynchronous + synchronous Communication Face to face Virtual meeting Organization Structure Hierarchical NetworkedStyle Team/Formal Group Community/Market Openness Product Access Push-driven Pull-driven Transpa- rency Complete product Incomplete prod- uct Integration of contri- butions Pre-production de- cisions Pre and post- production review Process Decision- making Closed/Centralized Open/Decentralized Mobility Entry threshold High LowTurnover rate Low High In the invisible college [93], researchers complement each other by sharing equipments, ideas, knowledge, techniques, and tools. In other words, scienti c curiosity and ambition are the driving forces for researchers to work together in an invisible college. As far as these networks are concerned, they are neither pre-designed, nor random. Rather, these networks organize and operate based on self-organizing processes, which are also the main focus of this research. With better understanding of such rules, policymakers could make better policy decisions in terms of improving innovation performance and investment e ciency. 2.1.1 Understanding GPS as a Communication System Social network theory is often used to describe the structure of scienti c communities, but little research is conducted on the formation of network of communities [64]. Communication networks and the organizational forms of the 21st century are under- going rapid and dramatic changes [32]. There are many theories that focus on the role of 10 social communication mechanisms in explaining the emergence and evolution of community networks. Table 2.2 summarizes selected social theories. Table 2.2: Selected Social Theories [61] Theories Sub-Theories Theories of Self-interest Social Capital Structural Holes Transaction Costs Mutual Self-Interest & Collective Action Public Good Theory Critical Mass Theory Cognitive Theories Semantic/Knowledge Networks Cognitive Social Structures Cognitive Consistency Balance Theory Contagion Theories Social Information Processing Social Learning Theory Institutional Theory Structural Theory of Action Exchange and Dependency Social Exchange Theory Resource Dependency Network Exchange Homophily & Proximity Social Comparison Theory Social Identity Physical Proximity Electronic Proximity Theories of Network Evolution Organizational EcologyNK(C) Theories of self-interest explain how people make decisions based on their personal fa- vorites and desires [20]. Theories of mutual interest and collective action focus on why outcomes produced by coordinated activity are unattainable by individual action [21]. Con- tagion theories examine how ideas, messages, and beliefs spread through some forms of direct connection [18]. Cognitive theories address the role that knowledge and perception play in socio-technical communication networks [88]. Exchange and dependency theories seek to address how communication emerges based on the distribution of information and resources 11 across the members [43]. Homophily and proximity theories explore the emergence of com- munication networks based on the similarities of network members? traits [15]. Theories of network evolution study the mechanisms of variation, selection, retention, and competition [95]. 2.1.2 Understanding GPS as a Creative Ecosystem Scienti c communities behave in similar ways to an ecosystem in that there exist both competition and cooperation over the use of resource; that is, interacting species (i.e., com- munities) compete to gain resources from their environment to survive and grow, while also cooperating to develop symbiosis and to improve their chance for survival. Arti cial ecosys- tems have grown as a generalized evolutionary approach for creative discovery, since their applications across di erent domains have been developed such as economics [6], ecology [59], and social science [29]. Arthur [6] extends the frameworks of economics from viewing economic activities within an equilibrium steady state, to viewing economic activities con- tinually changing, and constantly adapting and co-evolving within a dynamically changing environment. Mitchell [59] abstracts the natural evolution at a high level into two phases: evolution using genetic operators (e.g., combination, mutation etc.), and selection of descen- dents based on tness. Epstein [29] studies the underlying rules and develops models about how the decentralized local interactions of heterogeneous autonomous social individuals could generate regularities observed in the real world. We can summarize characteristics of arti cial ecosystems under eight basic concepts and processes listed as follows [56]: The phenotype used in arti cial ecosystem forms the basis of an individual. A collection of individuals represent a species. Individuals are distributed and move within the environment. Individuals inhabit and interact within environment. 12 Individuals have abilities to modify and change the environment. Individuals have a scalar measure to represent success, i.e., health. Individuals undergo stages of development, i.e., life-circle. An energy-metabolism cycle describes the resources cycle. In relation to dynamics of ecosystems, scienti c communities exhibit the following char- acteristics: The domain of a scienti c community is its phenotype, which is composed of norms, practices, and skills. Clusters of communities are comprised of epistemic cultures that correspond to species. Communities are distributed globally. An explicit model of environment (e.g., funding agencies) in uences decisions of com- munities by altering the availability and distribution of resources. Scienti c communities have the ability to change and modify their environment as a result of research and technology transfer. Communities have a scalar measure to represent success, i.e., tness. Scienti c communities undergo stages of coalescing, growth, stability, and renewal. Scienti c communities adopt external funding and transfer human capital and knowl- edge into technology and products, which is similar to metabolism. Based on the characteristics above, scienti c communities can be viewed as a creative ecosystem. 13 2.1.3 Boundary Processes in Knowledge Creation and Di usion Boundary refers to something that indicates a limit or extent, which often has nega- tive meanings because it leads to limitation and a lack of access. Boundaries exist between communities; for example, there are technical communication challenges when communities of psychology and computer science jointly hold a meeting. Unlike in traditional scienti c teams, where boundaries are usually well de ned due to o cially sanctioned a liation, the boundaries of the open source communities are rather uid because they engage in interdis- ciplinary research. On the other hand, boundaries are also important for a learning system, because boundaries o er learning opportunities, and the learning opportunities are di er- ent from those within a community [96]. In a community, the competence and experience converge since it is the basis for a community to be stable. However, the competence and experience tend to be diverse so as to expose communities to a foreign competence [97]. Therefore, both strong core activities within a community and active boundary processes determine the learning and innovation potential of a social learning system [57]. The in uence among communities is bidirectional, which means that each scienti c com- munity in uences other communities by publishing papers and holding conferences. At the same time, they are also a ected by peer communities. Such processes are called boundary processes, which \arise from di erent enterprises; di erent ways of engaging with one an- other; di erent histories, repertoires, ways of communicating, and capabilities" [97]. Through boundary processes, communities with common interests that promote each other become closer and closer so that clusters emerge. Communities in a cluster share similarities in terms of discipline, norms, skills, and expertise, and strongly connect with each other. In addition, there are still interconnections among clusters, which are important for di erent ideas to di use, although such interconnects are not as strong as those inside clusters. At last, the environment communities inhabit is another noteworthy item. It is expected that environments in uence the behavior of communities by investments policies [91]. 14 In order to build bridges across boundaries, some communities may act as brokers between communities that are originally disconnected. Brokering makes boundary processes occur, through which knowledge di uses. Concomitantly, communities that act as brokers are likely to thrive because they can bene t from di erent experiences and views [97]. This view is similar to the theory of structural hole in that communities invest social capital in a structural hole to gain pro ts, as a broker builds links between those communities that originally disconnected. E ects of boundary processes on GPS are di erent from those on traditional scienti c teams in terms of strength, because they have di erent decision-making styles. The decision- making style in traditional teams is centralized, which results in traditional teams being less likely to be a ected by boundary processes. On the other hand, GPS communities are more likely to be in uenced by boundary processes since its decision-making style is decentralized. In addition, GPS communities have lower entry threshold than traditional ones, which causes higher mobility that in turn leads to higher in uence of boundary processes on GPS com- munities. Therefore, to deal with di erent e ects on GPS or scienti c teams, the simulation model presented in the dissertation adjusts the receptivity (ratio of weights of neighbors to weights of itself). In other words, the receptivity for GPS is larger than that for traditional scienti c teams. 2.1.4 Understanding GPS as a Complex Adaptive System A complex system is composed of interconnected parts that as a whole exhibit one or more properties (behavior among the possible properties) not obvious from the properties of the individual parts. In essence, complexity is concerned with emergency, that is, the process where the global behavior of systems results from the actions and interactions of agents [99]. The behaviors of scientists and scienti c communities have the characteristics of com- plex adaptive systems. While scientists and scienti c groups adapt their behavior to t their changing environment, they also actively shape it to create cognitive niches to improve their 15 resilience and success. The forming, dissolution, and maintenance of emergent collaboration structures in reaction to opportunities, resources, and environmental (e.g., science policy) interventions can be viewed as a dynamic ecosystem [108]. Therefore the formation of sci- enti c domains, problem areas, and disciplines occurs in the context of a complex adaptive system [41]. Knowledge creation in GPS is comprised of a large population of decentralized networked individuals and groups of scientists who interact with and in uence each other to form aggregate emergent communities of interest around focal problem domains. As a complex adaptive system, a GPS has the following characteristics: [108] 1. Problem solving behavior, as well as emergence and co-evolution of communities are results of activities and interactions of scientists. 2. Communities compete and cooperate to form and sustain cognitive niches and interact through boundary processes. 3. Scientists and knowledge have mobility across boundary of scienti c communities. Mo- bility fosters innovation [25]. 4. Consists of many complex subsystems (e.g., scienti c communities, academic institu- tions, R&D institutes). Unlike in a traditional research project, where scientists are guided by a central au- thority, scientists in a GPS aim at not only advancing science but also choosing a problem based on their self-interests. During the scienti c process, scientists generate new problems by solving existing problems, which in turn attracts more scientists with similar interests to participate. Thus a circle of positive feedback forms. In [70], Pirolli illustrates how the dynamics of information foraging play a negative feedback when solutions become routine and become less novel due to diminishing rate of returns. Under such positive and negative feedback, scientists adapt their behavior to improve their tness and success [108]. 16 2.2 Environmental Constraints The environmental constraints in our research refer to investments and policies that policy-makers can leverage to in uence the activity of scienti c communities and further foster the innovation performance. One of the policy instruments available to policy-makers is the investment strategy in science and technology. The other is to foster the role of competition and cooperation in the promotion of discovery [91]. However, the impacts of these various policies on innovation are largely unknown. McCormack [56] states that the design of environments based on which creative behavior is expected to emerge is at least as important as the human capital which is expected to evolve within the environment. The lack of knowledge about impacts of policies can lead to unintended consequence [91]. For example, the goal of College Opportunity and A ordability Act of 2007 [68] is to stabilize the state higher education spending and decrease the cost of colleges. But there is an undesired consequence that the rate of growth of state higher education spending in the future is also reduced. Although the importance of (public and private) investment on science is widely ac- cepted, there is still a lack of theories and methodologies to evaluate the nature and dis- tributions of investments. Reed et al. [76] propose a seven-step framework to help pro- gram managers develop a well-structured impact evaluation: 1. Identify scope, objective, and priorities; 2. Select the types of evaluation to be completed; 3. Select the aspects of deployment-induced changes to be evaluated; 4. Identify research questions and metrics; 5. Design the evaluation; 6. Conduct the evaluation; 7. Report and use the results and data. Knezo [49] concludes that Federal agencies allocate funding based on topical or eld-speci c priorities that have shifted over time, by studying the trends of Federal R&D budget in the last half century. The other policy instrument policy-makers can make use of is to foster competition and cooperation. Competition for resources leads to various kinds of alliances/mutual relation- ships and to the establishment of various symbiotic relationships [3]. Axelrod [8] undertakes 17 a variety of simulations related to iterated Prisoner?s Dilemma, drawing conclusions based on these experiments about the relationship between sel shness in the short run versus coop- eration in the long run. Axelrod also gures out ways in which groups form, adhere, oppose or join other groups. Based on the understanding of how scienti c communities respond to changes in funding across research areas, we can design an environment by making policies to guide communities to act in the ways agencies expect them. It is to analyze the con guration of an environment given an expected behavior or output of communities. Since the relationship between the environment and communities? behavior may not be one to one, it is feasible that multiple environmental con gurations correspond to the same innovation outputs. The basic alloca- tion decision is the choice of which items to fund in the plan, and what level of funding it should receive [104]. There are two contingency mechanisms dealing with unexpected situa- tions. One is to determine which community will be funded if more resources are available. The other is to determine which community will be sacri ced if total resources have to be shrunk. For designing the environment, there are at least two potential aspects that need to be explored further. Find out available investment strategies that can be used to compare e ects on in- novation performance or potential. For example, broad investment in all domains, key support for some speci c domains, random allocation, and dynamically changing allocation based on contributions. Find out the investment strategy to maximize the innovation performance or innovation potential. For example, what is the investment strategy so as to maximize the diversity of communities? One di erence between traditional teams and GPS is that GPS may require fewer re- sources to create and maintain because resources can be shared by collaboration in GPS. 18 In addition, GPS is driven by the desire of scientists to do original and creative research [93], which in turn can reduce costs to hire scientists. Therefore, the di erence between traditional teams and GPS is re ected by the maintenance resource model with adjusting the maximum/minimum resources needed. In other words, traditional teams have higher maximum/minimum resources needed than GPS. So the model presented in this study tries to deal with both traditional teams and GPS by adjusting the parameters of the model. 2.3 Relation to Earlier Work Earlier studies pertaining to the application of computational models to scienti c dis- covery processes focus on simulating cognitive processes and re-enacting discoveries [48]. Speci cally, computational modeling of concept formation is viewed as central to discov- ery and has a long history [44]. More recent and complex applications of computational models include arti cial intelligence and machine learning techniques that view science from the perspective of problem solving [84]. Most of these techniques focus on mimicking the discovery processes employed by individual scientists. Yilmaz [109] develops an agent simu- lation model conducted to examine the impact of culture and con ict management styles on collective creativity in open source innovation systems. How collectives govern and coordi- nate the actions of individuals to maximize innovation output is examined in [110] to better understand the emergence of collective creativity. Besides computational models, there are also other methods to study scienti c activi- ties. The overlay map presented in [75] is a visualization technique that intends to catch the reforms that most science and technology institutions are undergoing to transcend the tradi- tional boundary of disciplines. Visual analytics is a new eld of research that is focusing on how people interact with information to make decisions [24]. Such visualization techniques are also applied in other domains. Chemists use visualization to present a visual comparison of properties or states in two or more systems so as to present visual prediction of properties or states in the future [33]. Gloor [36] introduces an alternative method of measuring the 19 success of knowledge workers. In [24], the visualization technique is used to address how the public investment in science a ects the lives of U.S. citizens. As a new emerging tool, visu- alization techniques also have some limitations. The data for visualization has so far been limited to publication and patent data. In addition, in many cases it is not possible to pool many cross-country data-sets because the data is gathered in di erent ways. Furthermore, if there is no understanding of the underlying dynamics, the use of visualization does not advance metrics [24]. Therefore, it is reasonable to combine visualization techniques with computational models to better understand the underlying mechanism and to better present results. Although signi cant research has been conducted on scienti c communities from the network perspective, simulation modeling of such communities is rare. In [34], Gilbert de- velops a model where citation patterns and growth of knowledge are simulated to exhibit empirical regularities observed in scienti c communities. However, this study does not aim to consider social processes pertaining to enculturation and innovation. On the other hand, the simulation study presented in [112] views scienti c discovery as a social process. How- ever, it focuses on the interactions between single scientists so that it does not analyze the pattern of network of communities formed by single scientists. In the context of innovation, the use of simulation of collective invention and innovation di usion [22] revealed the sig- ni cance of social network structure in knowledge creation and di usion. Furthermore, in [56], McCormack uses the HSB model to represent arti cial species so as to demonstrate similar species with similar color, which ignites the idea of using the HSB color model to vividly depict the states of scienti c communities. In order to analyze the inner dynamics, an organization is often divided into several interconnected components such as organizational structure, agent, and environment [26]. Recently, the Simulating Knowledge Dynamics in Innovation Networks (SKIN) [85] emerges as a tool to simulate knowledge dynamics in inno- vation networks, which has been applied in learning competence [35], the university-industry links [2], and technology spillover [73]. 20 Building on these earlier studies, the model introduced herein is: 1. a computational model, that can provide not only qualitative but also quantitative analysis. 2. a model based on complex system theories, boundary processes, and communication network theories. 3. an adaptive learning system of communities that can change their behavior based on their tness. 4. using communities as the unit of analysis to track the scienti c impact of investments. 21 Chapter 3 Design Concepts and Details 3.1 Overview Three major components are selected to represent the status of scienti c communities: domain, maturity, and resources. Domain refers to discipline or task characteristics. Ma- turity is an attribute that indicates the scienti c sophistication and degree of advancement in a speci c domain. Resources that a community holds are vital to undertake scienti c activities. In order to visually depict the states of communities, the HSB color model is used. Hue indicates the domain of a community, as it determines the basic color such as red (0o), blue (120o), green (240o) etc. Saturation represents maturity as it serves as an indicator for the level of growth. Brightness corresponds to the level of resource. Figure 3.1 shows the visual snapshots of our model with grid and network topology, respectively. Each cell represents a community whose color indicates its internal states. As the gure depicts, the community network pattern looks like a color landscape. Hence, the simulation model of GPS communities is named the Colorscape model. Additionally, Colorscape is generic, as it can be used to represent both traditional scienti c teams and GPS communities by adjusting model?s parameters. Figure 3.2 represents the components of the simulation model and their relationships. In Figure 3.2, there are three kinds of relationships: interaction within a community, inter- action between communities, and interaction between community and environment. These three interactions are the fundamental driving forces for the dynamics of scienti c systems composed of interconnected scienti c communities. 22 Figure 3.1: Snapshots of Colorscape Model Figure 3.2: Network of Scienti c Communities 3.2 Process As shown in Figure 3.3, the process of our simulation model mainly consists of activities speci ed by six sub-processes: resource allocation, interaction within community, learning, innovation, growth, and fade. Resource allocation refers to strategies used to distribute resources to communities. Interaction within community refers to scienti c activities at 23 the macro level i.e., a community is driven by funding to improve its maturity. Learning and innovation between communities mimic the boundary processes of communities i.e., communities a ect and are in uenced by peer communities. Growth is de ned as the process through which communities improve their extent so as to increase their in uences. Fade refers to disappearance of a community due to loss of resources and attractiveness. These six sub-processes are discussed in detail in the following sections. Figure 3.3: The Activity Flow of the ColorScape Model 24 3.3 Resource Allocation As shown in Figure 3.4, the triple helix of University-Industry-Government [30] is a spiral model of innovation that captures multiple relationships in the process of knowledge capitalization. Governments provide subsidies and grants as resource. Academia generates knowledge, licenses, and graduates as input to industry; industry generates products as input to innovation, as well as returns on capital and investment capital as input to nancial system. Figure 3.4: Triple Helix of University-Industry-Government Relations My research focuses on the relationship between government and academia, so Figure 3.4 is reduced to Figure 3.5. In the baseline model, the strategy for resource allocation is 25 uniform allocation; that is, the total resources are distributed among all communities equally. The total amount of resources available for allocation is equal to sum of the contributions of communities coupled with external funding. Contributions by communities are based on the premise that produced knowledge can be transferred to technology, which in turn results in economic growth. Figure 3.5: Resource Allocation Contributions provided by a community is moderated by the product of its maturity and resource. This is based on the hypothesis that communities with higher maturity and resources are expected to be more productive. This is expressed in Equation 3.1: Rt = #communitiesX i=1 (Fi;t +Si;t Bi;t); (3.1) where Rt indicates the total resource available at time t. Fi;t denotes the external funding allocated to community i at the time t. Si;t and Bi;t indicate maturity and resources of community i respectively. 26 3.4 Interaction within Community Interaction within the community refers to the scienti c activities at the macro level, i.e., the community is driven by funding to improve its maturity. The interaction process is depicted in Figure 3.6. Figure 3.6: Interaction within Communities 3.4.1 Relationship between Maturity and Resources Riss et al. [78] develop a model of knowledge maturation that includes three phases: coalescing, maturing, and transformation. At the phase of coalescing, the e orts to improve maturity of knowledge are high, because it is a period of exploration toward a solution for a new problem. As the problem and methodology become clear, the maturity improves faster as a result of aggregation of knowledge of individual scientists during the phase of maturing. In the last stage, signi cantly high e orts are required to standardize knowledge artifacts to make them reusable [78] and resolve con icts among stakeholders because tension builds up as maturity passes the threshold for the problem and method to be settled. Thus, a U-shaped trend between maturity and e orts arises. 27 The relationship between resource and saturation is de ned as follows: Rm;t = Rmin Rmaxc St +Rmax if St c Rm;t = Rmax Rmin1 c St + Rmin cRmax1 c if St >c 0 < c < 1 (3.2) where Rm;t is the resources needed to maintain the current saturation. St is the current level of saturation, and c is the critical value that divides the trend into two categories. Rmax and Rmin are the maximum and minimum resources needed respectively, which are adjustable and based on types of communities. At each time step, a community receives resources via external funding. But not all available resources can be used to push forward the maturity of community, i.e., only part of the resources helps advance maturity, because the following learning and innovation processes also require resources. How much saturation the community can gain by these resources is determined by the following equation: St+1 = St + t (1 St) Rs;t; (3.3) where St+1 is the maturity of the community at the time t + 1. Rs;t is the resources that could be used to increase maturity, which is a proportion of (TotalResource Rm;t). The increase in saturation is adjusted by t, which is an exponential decay function with gradually decreasing slope over time, because more e orts are needed to sustain a community with increasing maturity. t = e t0= (Smax Smin) +Smin; (3.4) where t is an adjusting parameter to control the increment of saturation. t0 is the time period during which saturation increases continuously. is a constant coe cient to 28 control the slope of the function curve. Smax and Smin refer to the maximum and minimum increment for saturation respectively. At the time when saturation just begins to increase, t is equal to Smax. As saturation increases, t gradually decreases toward Smin. 3.4.2 Resources Consumed As maturity increases, it is necessary to consume resources because technology develops based on research and development costs. Resource level is updated as Equation 3.5, Bt+1 = Bt +Rt Rm;t Rs;t; (3.5) where Bt+1 is the resource the community has at the end of this interaction process. Bt denotes the resource the community already holds at the current time. Rt is the new resource allocated to the community at the current time. Rm;t is the resource to maintain the current state, which is based on equation 3.2. Rs;t is the resource needed to change maturity. 3.5 Learning between Communities Learning between communities mimics the boundary processes between communities, i.e., communities a ect and are in uenced by peer communities. In the Colorscape model, based on the assumptions predicated on the Homophily theory [61], a community is more likely to communicates with similar others and is highly in uenced by similar peer com- munities. As depicted in Figure 3.7, when a scientist or domain knowledge in community 2 transfers to community 1, the scientists of community 1 who interact with and are in uenced by the new knowledge are pulled toward to the new direction. Figure 3.8 depicts the process of learning, which mainly includes update neighbors, update weights of neighbors, update discipline, maturity and resources of the current com- munity through boundary processes based on the homophily theory. 29 Figure 3.7: Learning Process 3.5.1 Updating the Intensity of Communities? In uences The in uences that communities exert or receive are re ected by their interaction fre- quency. Interaction frequency between communities is depicted by the weights associated with links in the evolving communication graph. According to the Homophily theory, the more similar communities are, the stronger the in uences. So, the intensity of community j?s in uence on community i is de ned as follows: 8 >>< >>: Wji;t = Wji;t 1 +CW Iji;t (1 Wji;t 1) if Iji;t 0 Wji;t = Wji;t 1 +CW Iji;t Wji;t 1 otherwise (3.6) where Wji;t is the in uence of neighbor j at the current time. CW is a number between 0 and 1 and is inversely proportional to inertia (resistance to change in a community). Iji;t is the intensity of change in the in uence, which is de ned as: Iji;t = (1 Dji;t)4 (1 Di;t)4; (3.7) where Dji;t is the dissimilarity which is equal to the distance between community i and community j in terms of current hue at the time t whose equation is 3.8. Di;t is the average distance between community i and all of the neighbors at the time t. This function grows much faster when dissimilarity between i and j becomes smaller in comparison to average dissimilarity, resulting in higher intensity Iji;t. The equation for the dissimilarity between community i and j is de ned as follows: 30 Figure 3.8: Flow Chart of the Community Learning Process Dji;t = Dissimilarity(Hi;t;Hj;t); (3.8) where Hi;t is the hue of community i at the time tick t. Hj;t is the hue of community j at the time j. Dissimilarity(x;y) = 8> >< >>: jx yj 180 ifjx yj 180 360 jx yj 180 otherwise (3.9) 31 3.5.2 Updating the Maturity of a Community Learning among communities a ects both saturation and discipline. Saturation refers to maturity of the domain that is the state or quality of being fully grown or developed. The reason for change in saturation by learning is that scientists can borrow theories and methodologies from other domains to improve the skills and knowledge necessary to solve problems in their own domain. As shown in Figure 3.9, the circle refers to hue and the vector refers to saturation. Length of the vector indicates the strength of saturation. The longer the vector is, the larger is the saturation. Angles represent di erences between communites in terms of their domains. The larger the angle is, the more di erent these domains are. S1, S2 and S3 are saturation of community 1, community 2, and community 3, respectively. S2 and S3 are in di erent domains from S1. But both S2 and S3 have e ects on S1 moderated by the angles and , respectively. So, the in uence from S2 is equal to W2;1 S2 cos( ). On the other hand, the in uence from S3 is W3;1 S3 cos( ), where cos( ) is negative since is obtuse. Figure 3.9: Updating Maturity during the Learning Process 32 The change of saturation during the learning process is the sum of in uences from peers, which is de ned by equation 3.10, Si;t+1 = Si;t +AS #neighborsX j=0 Wji;t Sj;t cos( ji;t); (3.10) where Si;t+1 refers to the saturation of community i at the time t+ 1. ji;t refers to the angle between the hues of communities i and j. AS is a function of susceptibility de ned in Equation 3.11: AS = e R; (3.11) where R refers to the resources the community currently holds. is a constant coe cient to control and calibrate the rate of change in susceptibility. 3.5.3 Updating the Discipline of a Community Learning can lead the current community to change its hue i.e., discipline (speci c norms, practices, and relevant skills) due to in uences from neighbor communities. Concomitantly, the community itself is inclined to realize its own target norms as shown in Figure 3.10, where the circle refers to hue and the vector refers to saturation. Angles between vectors represent di erences of communities in terms of their domains. The larger the angle is, the more di erent these domains are. Hcurrent1 , Hcurrent2 and Hcurrent3 are the current hue of community 1, community 2, and community 3, respectively. Hcurrent2 and Hcurrent3 are in di erent domains from Hcurrent1 . But both Hcurrent2 and Hcurrent3 have e ects on Hcurrent1 with angle of and , respectively. Additionally, Hcurrent1 is pulled by its target hue Htarget1 due to its intention to realize its target. So, the change of hue during the learning process is the sum of in uences from peers and its own inclination, which is de ned by equation 3.12. 33 Figure 3.10: Domain Update during the Learning Process Hcurrenti;t+1 = Hcurrenti;t +AH 0 @ 0 @ #neighborsX j=1 Wji;t (Hcurrentj;t Hcurrenti;t ) 1 A +Wi;t (Htargeti;t Hcurrenti;t ) 1 A (3.12) where Hcurrenti;t+1 refers to the new hue after the learning process. Hcurrenti;t refers to the current hue of the community i. Hcurrentj;t refers to the current hue of the community j. Wji;t refers to the in uences of community j on community i at the current time. Wi;t refers to the resistance of community i to reach its own target hue. Htargeti;t refers to the current target hue of community i. AH denotes susceptibility and is de ned in Equation 3.13 as: AH = e RS ; (3.13) where S refers to saturation. Other parameters are the same as Equation 3.11. AS and AH are community?s susceptibility to in uence on saturation and hue respec- tively. Susceptibility to in uence on saturation (AS) and hue (AH) decreases with increasing 34 resources, because a community obtains greater success if the community acquires more re- sources, which in turn inhibits the strength of in uences exerted by other communities. In addition, as saturation increases, a discipline becomes more susceptible to change. When the resource level is high and the discipline is saturated, members are more likely to experiment with new ideas. 3.5.4 Updating the Resource of a Community During the learning process, communities purchase instruments, organize meetings and forums, or spend time on new materials etc. The amount of resources consumed is propor- tional to the degree of change in saturation and hue. Resource consumption for learning is de ned as follows: Bi;t+1 = Bi;t (CH j Hj+CS j Sj); (3.14) where H = Dissimilarity(Hcurrenti;t+1 ;Hcurrenti;t ) S = Si;t+1 Si;t Bi;t+1 refers to the resource of domain i at the next time. Bi;t refers to the resource of domain i at the current time. CH and CS are constant numbers to convert changes of hue and saturation to resource respectively. 3.6 Innovation Process Innovation changes the norms of the community i.e., target hue in the Colorscape model, because changing target hue is a strategy for a community to adapt to its environment. Moving target hue of a community toward its current hue can decrease resource consumption during the learning process, which in turn improves its sustainability. The distance between current and target hue is de ned as exibility that is an important requirement for innovation [90]. Therefore, a requirement for innovation is exibility greater than a threshold as: 35 Dii;t TInnovation; (3.15) where Dii;t is the distance between current hue and target hue. TInnovation is the toler- ance. In addition, there are two kinds of innovation patterns, one of which is reorganization, the other is specialization. Reorganization means that the community starts transforming it- self by moving its accepted target toward the current state. On the other hand, specialization means branching out new communities. Whether reorganization or specialization happens is determined by a parameter called reorganization tendency. The innovation process is depicted in Figure 3.11. Figure 3.11: Flow Chart of Innovation 3.6.1 Reorganization Reorganization process a ects the hue of the target color, which is the weighted sum of target colors of the in uential neighbors and resistance to change. Communities are 36 in uenced by target colors of neighbors because the target of a community determines its future direction and can be seen as its vision. As shown in Figure 3.12, Htarget1 is in uenced by Htarget2 , Htarget3 , and Hcurrent1 . Figure 3.12: Updating the Domain during the Innovation Process Htargeti;t+1 = Htargeti;t +AH 0 @ 0 @ #neighborsX j=1 Wji;t (Htargetj;t Htargeti;t ) 1 A +Wi;t (Hcurrenti;t Htargeti;t ) 1 A (3.16) where Htargeti;t+1 refers to the new target hue after the reorganization process. Htargeti;t refers to the current target hue of the community i. Htargetj;t refers to the current target hue of the community j. Wji;t refers to the in uence of community j on community i at the current time. Wi;t refers to the resistance of community i to retain its own current hue. Hcurrenti;t refers to the current hue of community i. AH is susceptibility of community i and is de ned in Equation 3.13. 37 Innovation may refer to incremental and emergent or radical and revolutionary changes in thinking, products, processes, or organizations [101]. Therefore, innovation requires addi- tional resources. Resource consumption during innovation is de ned as follows: Bi;t+1 = Bi;t CH j Hj; (3.17) where Bi;t+1 refers to the brightness of domain i at the next time. Bi;t refers to the brightness of domain i at the current time. H refers to the changes of target hue of domain i. CH is a constant value used to convert hue to resources needed for the innovation process, which is the same as CH in equation 3.14. 3.6.2 Specialization Specialization corresponds to the fact in the real world that new communities are split from the original community if the current community cannot match the expectations of all members. When specialization occurs, a new community is created. The new community occupies the nearest empty cell to the current community. If there are no empty cells, then specialization cannot happen. The underlying reason is the carrying capacity that is de ned as the maximum number of communities that the current environment can sustain [72]. After the new community is created, the current color of the new community is the same as the original community. On the other hand, the target color of the new community is generated randomly within a range, as shown in Figure 3.13. 3.7 Grow and Fade Following the innovation process, if the resource of a community cannot maintain its current state, then Rs;t is decreased, and the processes of interaction, learning and innovation starts over. The iteration process continues until the remaining resources can maintain the current state or Rs;t is equal to 0. When Rs;t is equal to 0, the community fades and 38 Figure 3.13: Specialization is removed from the current context. On the other hand, if the community has enough resources to maintain and the neighbor cell is empty, then the community is likely to extend to occupy neighbor cells with a small probability. This captures evolutionary dynamics by retaining those communities that are t to survive in the current environment. 3.8 Heterogeneous Adaption Individual communities can adaptively change the weights of interconnections with other communities based on the environmental feedback so as to maximize their tness [58]. The tness refers to the resource the community gains. The more resources the community gains, the higher its tness becomes. On the other hand, the tness decreases if fewer resources are acquired. Furthermore, the weights of interconnection evolve along with the tness. The equations for weights to change are as follows: two groups that correspond to weights of neighbors and weights of self respectively. 39 8> >>> >>>< >>> >>>> : Wji;t+1 = Wji;t + Wji;tW i;t+ P#neighbors k=1 Wki;t (1 Wji;t) if fitness inceases Wji;t+1 = Wji;t Wji;tW i;t+ P#neighbors k=1 Wki;t Wji;t if fitness decreases Wji;t+1 = Wji;t otherwise (3.18) 8 >>>> >>> < >>>> >>>: Wi;t+1 = Wi;t + Wi;tW i;t+ P#neighbors k=1 Wki;t (1 Wi;t) if fitness inceases Wi;t+1 = Wi;t Wi;tW i;t+ P#neighbors k=1 Wki;t Wi;t if fitness decreases Wi;t+1 = Wi;t otherwise (3.19) where Wji;t refers to the original interconnection weights of community j to community i. Wij;t+1 refers to the new interconnection weights after feedback based on tness. P#neighborsk=1 Wki;t refers to the sum of weights of all neighbors. Wi;t refers to the tendency of community i to reach its own target hue. The range of weight is between 0 and 1. If tness rises, the weight should increase toward 1 in proportion to the contributions of community j i.e., Wji;t=(Wi;t + P#neighborsk=1 Wki;t). On the contrary, if tness falls, the weight should also de- crease toward 0 in proportion to the contributions. It is worth noting that the link between j and i is removed, if Wji is smaller than a threshold. 3.8.1 Initialization Table 3.1 describes all the state variables and their initial values in the simulation model. The signi cances of each variable are discussed in the following. 1. Carrying capacity (initial community number) refers to the size of the whole scienti c society that is composed of single communities interconnected with each other. In biology, the term of minimum viable population [103] is the lower bound of population 40 Table 3.1: Initial Values of State Variables Parameters Name Range Initial Value Carrying Capacity [10, 200] 100 Stop Time [1;1) 1000 Startup Funding [1, 2] 2 Parameter Fi;t in equation 3.1 [0.1, 1] 0.5 Tolerance [0, 1] 0.2 Reorganization Tendency [0, 1] 0.5 Parameter c in equation 3.2 [0, 1] 0.5 Parameter Rmax in equation 3.2 [0, 1] 0.9 Proportion of resources to advance maturity [0, 1] Random Max Increment of Saturation Per Step (Smax in equation 3.4) [0.1, 1] 0.5 Min Increment of Saturation Per Step (Smin in equation 3.4) [0, 1] 0.1 in equation 3.4 (0;1) 100 CW in equation 3.6 [0, 1] 0.5 in equation 3.13 (0;1) 3 Resources Cost to Push Hue (CH in equation 3.17) [0, 1] 1 Resources Cost to Push Saturation (CS in equation 3.17) [0, 1] 1 Current color HSB range Random Target color HSB range Random Initial weight of Self [0, 1] Random Initial weight of neighbor [0, 1] Random Weight to grow [0, 1] Random 41 of species so that it can survive. So, it is expected that the initial community number is related to diversity and resilience. 2. Startup Funding (parameter Fi;t in equation 3.1) indicates the external funding al- located to the community. Research funding could be structured to encourage the formation of new communities [91]. Also research funding has e ects on the develop- ments of existing communities. 3. Tolerance (threshold for innovation to happen) and reorganization tendency determine innovation occurrence frequency and which type of innovation occurs. Since innovation changes the norms of the community, it is of interest to investigate the relationship between the type of innovation and diversity. 4. Parameter c and Rmax in equation 3.2 determine the form of the maintenance function that in turn decides whether or not the community could fade out. 5. Smax, Smin and in equation 3.4 determine how much maturity can be gained per time tick. On the other hand, the more maturity a community gains, the more resources the community consumes, which increases the likelihood of fading of the community. So Smax, Smin and are important parameters when total available resources are limited. 6. CW in equation 3.6 determines changes of weights of links that in turn determine in uences of peer communities during the process of learning and innovation. 7. in equation 3.13 determines the slope of the curve of susceptibility of a community. It determines the extent to which the community can be changed further. 8. CH and CS in equation 3.17 determine the relationship between domain change inten- sity, maturity, and corresponding resources consumption. So, these two parameters are expected to a ect the rate of fading. 42 Chapter 4 Implementation of Simulation Model 4.1 Introduction to Repast Repast is an acronym for the Recursive Porous Agent Simulation Toolkit [77] that is a free and open source agent-based modeling toolkit that simpli es model creation and use. Repast Simphony provides a rich variety of features including the following: The model development can use pure Java, Groovy, owcharts, and any mixture of them. A pure Java model execution environment includes built-in results logging and graphing tools that make it easy to change the appearances of agents. The context is based on a exible hierarchy that can realize the modeling and visual- ization of 2D environments and 3D environments. The discrete event scheduler is fully concurrent multithreaded. All the models developed by Repast are object-oriented. In general, the standard model using Repast is based on contexts and projections. There are some frequently used projections including grid, continuous space, network, and geography. Figure 4.1 shows how context, sub context, and projection interact. 4.2 Implementation of Agents Figure 4.2 represents part of the class diagram of this simulation model of open science communities, where there are four main classes: Community, SubCommunity, Neighbor and 43 Figure 4.1: Contexts and Projections CommunityStyle. Community is the major research object in the simulation model, which has two types of state variables and three functions. SubCommunity is used when a commu- nity occupies multiple cells. A community is comprised of one or more SubCommunities. The CommunityStyle class is used to render Community and SubCommunity, so as to show the correct color according to the states of domain. The Neighbor class represents communities connected with the current community. Figure 4.2: Class Diagram of Model A community is represented by its genome and state whose details are described as follows: 44 Genome Target color is (H, S, B) = (H, 1, 1), where H refers to discipline of community whose range is [0, 360). Wg is the probability for a community to grow i.e., occupy neighbor places. WS is propensity of community to move toward the target. WNk denotes the in uence exhibited on the community by the kth neighbor State H, S, and B of current color represent domain, maturity, and resource respectively. Age refers to the time period when the community exists in the context. Width is the number of cells the community occupies. The resource level R allocated at the current time is di erent from B, which represents the overall resources held by the community. 4.3 Visual Snapshots of the Simulation View The following three groups of gures depict snapshots of the Colorscape model over time with 2D, scale-free, and dynamic communication context respectively. Among these gures, there are two points in common, one of which is that communities become more colorful as the result of increasing maturity. The other is that clusters of similar communities emerge as the result of boundary processes among communities. 45 (a) (b) (c) (d) Figure 4.3: Snapshots of 2D Communication Context 46 (a) (b) (c) (d) Figure 4.4: Snapshots of Scale-free Communication Context 47 (a) (b) (c) (d) Figure 4.5: Snapshots of Dynamic Communication Context 48 Chapter 5 Veri cation, Validation and Evaluation The evaluation of a simulation model involves two major activities, one is veri cation, and the other is validation that includes conceptual and operational validation [9]. Concep- tual validation aims to assure that the conceptual model is consistent with the system under investigation [79]. Operational validation substantiates the accuracy of model?s behavior against the system behavior for its intended purpose and domain of applicability. [79]. Figure 5.1: Overview of Veri cation and Validation [92] Also, veri cation and validation can be conducted at the micro and macro level respec- tively. Thus, the strategies for di erent levels of veri cation and validation are summarized as shown in Table 5.1. 5.1 Veri cation Veri cation is the process of determining that a computer model, simulation, or fed- eration of models and simulation code and their associated data accurately represent the 49 Table 5.1: Veri cation and Validation at Micro and Macro Level Veri cation Conceptual Vali- dation Operational Val- idation Micro Unit test at thelevel of single function Ontological Adequacy: ground each equation on theory Activity of single agent against the- ory Sensitivity analysis with respect to sin- gle agent Macro Integration test at the level of components of agent Conceptual validity against theory Activities of set of agents against the- ory Activities of set of agents against em- pirical evidence Conceptual validity determined by experts Cross-model vali- dation Sensitivity analysis with respect to set of agents developer?s conceptual description and speci cations [1]. To achieve this goal, unit and integration tests are used. 5.1.1 Micro Veri cation Micro veri cation is carried out by unit test at the level of single function. Unit testing involves determining the correctness of the simulation program at the function level. All functions are tested using boundary and error conditions, and the outputs are observed for consistency against expected regularities. For demonstration purposes, the following example illustrates testing of the resource allocation module. The resource allocation strategy in the baseline model is used to allocate resources evenly across the communities. The total amount of resources each community receives is equal to resources allocated to each cell times the number of cells the community occupies plus the contributions to transfer technology. 1. When communities make no contributions and one cell per community: 50 Resources allocated per cell Resources received by each community 0 0 0.5 0.5 1 1 2. When communities make no contributions and two cells per community: Resources allocated per cell Resources received by each community 0 0 0.5 1 1 2 3. When communities make contributions 1 and one cell per community: Resources allocated per cell Resources received by each community 0 1 0.5 1.5 1 2 4. When communities make contributions 1 and two cells per community: Resources allocated per cell Resources received by each community 0 1 0.5 2 1 3 5.1.2 Macro Veri cation Macro veri cation is carried out by integration tests at the level of collectives of agents such as interaction, learning, reorganization, specialization, fade, and growth etc. Integration testing is the activity of software testing in which individual software modules are combined and tested as a group [102]. For our simulation model, we focus on the behavior of the Community class since community behavior is the focal aspect of our study. 51 The learning process is composed of several functions such as updating in uences of neighbors, calculating changes of hue, saturation, and brightness etc. Table 5.2 records the precondition and expected values (the column Next Color) of the integration test for learning process. Table 5.2: Summary of the Integration Test for the Learning Process Community 1 Neighbor Community 1 Current Color Target Color Receptivity Current Color Next Color (0,0,0) (0,1,1) 1 (180,1,1) (180,0,0) (0,1,0) (0,1,1) 1 (180,1,1) (180,0,0) (0,1,0) (90,1,1) 1 (180,1,1) (180,0,0) (0,1,1) (0,1,1) 1 (180,1,1) (8.96,0.95,0.9) (0,0,0) (0,1,1) 0 (180,1,1) (0,0,0) (0,1,1) (0,1,1) 0 (180,1,1) (0,1,1) (0,0,0) (0,1,1) 0 (90,1,1) (0,0,0) (0,0,0) (180,1,1) 0 (90,1,1) (180,0,0) In the above table, the rst three rows show that the current color of a community with receptivity of 1 will be changed to the same as the current color of interacted neighbors after the learning process, no matter what the community?s target color is. The di erence of the fourth row from the rst three rows is that the quotient of the community?s resource divided by saturation is equal to 1, which in turn makes its susceptibility (Equation 3.13) greater than 0. So, the current color under the case of the fourth row is the result computed by Equation 3.12. The last four rows illustrate that the current color of a community with receptivity of 0 is always pulled toward its target color, which is independent of in uences from neighbors. 5.2 Validation There are two major types of validation: conceptual validation and operational valida- tion. 52 5.2.1 Conceptual Validation The conceptual validation refers to ontological adequacy by grounding the underlying generative mechanisms of the model on theories and/or empirical evidence. Table 5.3 lists the evidence used to validate each subprocess of the Colorscape model. Table 5.3: Summary of Conceptual Validation of Each Subprocess Subprocess Evidence Interaction between environment and community Prey-predator models[46] Observed trends in NSF investments [60] Relationship between maturity and resources U-shaped model of the knowledge ma- turing process [78] Kuhn?s paradigm change theory [53] Updating intensity of communities? in- uences Homophily theory [15] Domain dynamics of a community Boundary processes [96], Social learn- ing theory [88]. Maturity dynamics of a community The formation process of DNA comput- ing [5]. Community reorganization Panarchy theory[37] Community specialization Panarchy theory[37] Fading process dynamics Panarchy theory[37] Community growth dynamics Panarchy theory[37] Panarchy is the structure in which systems of nature and systems of humans, as well as combined human-natural systems are interlinked in continual adaptive cycles of growth, accumulation, restructuring, and renewal [37]. Therefore, it is used in this study to validate the subprocesses of reorganization, specialization, grow, and fade. 5.2.2 Micro Operational Validation As far as the micro operational validation is considered, we compare the behavior of a single agent to the expected regularities. The followings illustrate three micro operation validation strategies used for the Colorscape model. 53 1. According to the Homophily theory, the intensity of in uences from similar peers is greater than that from di erent peers. So, one way to conduct micro operational validation is to investigate the impact of di erences between weights of in uences of neighbors associated with an agent. 2. The other strategy is to undertake sensitivity analysis with respect to single agents. For example, an agent with fewer resources is more likely to fade than agents with more resources. In addition, an agent with higher receptivity has stronger intention to change its domain toward its neighbors, compared with agents having lower receptivity. 3. When an unexpected phenomenon occurs, we need trace it back to the internal mecha- nisms of the model by viewing it as a white box. If the rationale behind the unexpected phenomenon is found and it matches either existing theories or empirical rules, then the model is validated with respect to the case. 5.2.3 Macro Operational Validation For macro operational validation, we focus on the global emergent behavior based on agent interactions i.e., external validation against real world. There are various strategies for macro operational validation, such as comparison of simulation outputs to target systems, empirical rules, cross-model validation, and sensitivity analysis [80]. Firstly, validation can be conducted via comparison of simulation model outputs and the actual data collected from the system under investigation. If data are not available, empirical rules can be used to determine the validity of the model, e.g., presence of power law in cities? population, nancial market, and internet sites [11]. In addition, we can evaluate impacts of a speci c parameter by changing its value but keeping others unchanged. If the result is consistent with the expected regularities, then we increase our con dence about the correctness of the simulation model under this speci c case. Finally, unexpected phenomena could emerge since the Colorscape model aims to study complex systems. When such unexpected phenomena 54 occur, it is necessary to trace back to the model and check the control and data ows step by step. If the unexpected phenomenon can be interpreted reasonably, then the model is validated under this phenomenon. 5.2.3.1 Emergence of Communities Figure 5.2 presents evolving states of communities over time during a single run of the Colorscape model. Initially, the colors of communities are grey due to their low maturity. As the simulation unfolds, states of communities become increasingly colorful due to increasing maturity through community sustainment, interaction, learning, and innovation processes. After a long run, clusters with similar color patterns emerge, which suggests formation of related disciplines as a result of communication and boundary processes. Figure 5.2: Growth and Formation of Community Clusters Figure 5.3(a) presents the ideal core/periphery network pattern. Figure 5.3(b) depicts the domain-domain network pattern of OBO community. Figure 5.3(c) is a snapshot of the network of the Colorscape model. From the visual comparison of these three gures, one can observe similar network structures such as the presence of core communities with a large number of links surrounded by periphery communities. 5.2.3.2 Comparison with Institutions around Department of Energy Figure 5.4(a) depicts how the clustering coe cient of DOE in nanoscale science changes from 1990 to 2005 [60]. Figure 5.4(b) depicts the clustering coe cient gathered by running the Colorscape model from time step 1 to 100. From these two gures, we observe very similar trends i.e., clustering coe cient oscillates within a limited range. 55 (a) Core/Periphery Network (b) OBO Domain- Domain Network (c) Snapshot of the Colorscape Model Figure 5.3: Emergent Network Patterns (a) (b) Figure 5.4: Comparison of Clustering Coe cient Figure 5.5(a) shows the number of institutions and average degree of institutions of collaborations in nanoscale science with DOE from 1990 to 2005 [60]. Figure 5.5(b) and 5.5(c) present the number of communities and average degree respectively gathered via running the Colorscape model from time step 1 to 100. From these three gures, we observe very similar trends, i.e., the number of communities increases gradually and the average degree uctuates within a limited range. This increases our con dence in the Colorscape model introduced in this dissertation, because of its capability of generating similar network patterns and metric outputs to corresponding indicators such as institutional structure involved in nanoscale science in Department of Energy (DOE). 56 (a) (b) (c) Figure 5.5: Comparison of Communities Number and Average Degree 5.3 A Robust Evolutionary Framework for Validation In this section, a robust evolutionary framework for validation based on genetic algo- rithm is introduced to nd the appropriate con guration parameters of the Colorscape model to produce results similar to the overlay map and OBO data in terms of the number of nodes, density, centrality, clustering coe cient, average path, and core/periphery ratio. 5.3.1 Design of the Validation Framework The strategy used in the operational validation of the ColorScape model is shown in Figure 5.6. Each step will be discussed in detail in the following sections. 5.3.2 Gene Encoding Gene Design has two subprocesses, one of which is gene encoding. The other is gene decoding. Gene encoding refers to the process of converting the con guration parameters to genes that evolve toward parameter space that exhibits accurate results with respect to system data. In general, genes are presented in binary strings, where each element is 0 or 1. It is essential to determine how many bits are needed to represent a con guration parameter, which in turn is determined by the value range and the degree of precision needed. If the con guration parameter is integral, then the binary presentation of the integer is used as 57 Figure 5.6: Validation Framework the gene. If the con guration parameter is oat, the degree of precision must be set up in advance so that the binary presentation of the oat number satis es the requirements. If the con guration parameter has a xed amount of feasible values, then the number of bits of the corresponding gene is determined by the total number of feasible values. The nal gene is a string that consists of all the con guration parameters. As shown in Figure 5.7, we select two parameters from a collection as an example, one of which is integer and ranges between 0 and 6. The other is oat and the range is from 0 to 1. For the integral parameter m, three bits are required since the maximum value is 6. For the oat parameter with value equal to 0.75, there are four possible values since the precision is set to 0.25. So two bits are required to represent parameter n. 58 Figure 5.7: Gene Encoding Since there are twelve allocation strategies, four bits are needed to represent allocation strategies. Based on the same rationale, the number of bits of other con guration parameters is determined. Figure 5.8 shows a sample of gene after encoding the con guration parameters of the Colorscape model, where the total number of bits is 21. Figure 5.8: Gene Example 5.3.3 Gene Decoding As an inverse process of gene encoding, gene decoding aims dividing the gene into parts corresponding con guration parameters. To calculate the tness of each gene, the gene has to be translated (decode) into con- guration parameters of the Colorscape. Then the Colorscape model is batch run given the parameters and return the outputs. The total number of bits of the gene is 21, among which di erent bits have di erent meaning. The following table interprets the relationship between bits and the corresponding meanings. 59 Table 5.4: Gene Decoding Bits Variable Name Code Value 0-1 Population 00 10 01 50 10 100 11 200 2-5 Allocation Strategy 0000 Uniform allocation with xed external resource 0001 Uniform allocation with technology transferring 0010 Allocation proportional to contribution with xed external resource 0011 Allocation proportional to contribution with tech- nology transferring 0100 Allocation proportional to cluster size with xed external resource 0101 Allocation proportional to cluster size with tech transferring 0110 Allocation proportional to importance of domains with xed external resource 0111 Allocation proportional to importance of domains with technology transferring 1000 Competition allocation 1001 P2PAllocation 1010 Random allocation with xed external resource 1011 Random allocation with technology transferring 6-7 Resource 00 0.1 Continued on next page 60 Table 5.4 { continued from previous page Bits Variable Name Code Value 01 0.4 10 0.7 11 1.0 8-9 Tolerance 00 0 01 0.3 10 0.6 11 1.0 10-12 Reorganization Tendency 000 0 001 0.1 010 0.3 011 0.5 100 0.7 101 0.9 110 1.0 13-15 Receptivity 000 0 001 0.1 010 0.3 011 0.5 100 0.7 101 0.9 110 1.0 16-18 Communication Frequency 000 0.1 001 0.2 010 0.3 Continued on next page 61 Table 5.4 { continued from previous page Bits Variable Name Code Value 011 0.4 100 0.5 101 0.6 110 0.8 111 1.0 19-20 Growth Threshold 00 0.5 01 0.7 10 0.8 11 1.0 There is one noteworthy aspect, that is, speci c values of bits may not have real mean- ings, for instance, receptivity of 111. If it happens, a recovery strategy is required. The recovery strategy used here is to mod the old value by the maximum practical value. For the receptivity of 111, the new receptivity is equal to 111 % 7 = 000. 5.3.4 Population Initialization Population initialization involves generating a collection of genes with the prede ned total number. Each bit of a gene is assigned randomly as 0 or 1. Once the total number of genes is assigned, the population is generated automatically. The size of the population used in the validation framework is 100. 5.3.5 Repair to the Genes During the evolution, the generated new genes may be out of the feasible range. In this case, a repair is needed to guarantee the validity of the gene. Considering the parameter m 62 shown in Figure 5.7, if the generated value is 111, it is beyond the feasible range since its maximum value is 6. Two strategies can be used to make the repair. One is to mod the new value by the maximum value, i.e., 7%6 = 1. Then the repaired gene is 001. The other strategy is to randomly map it into the feasible domain. 5.3.6 The Fitness Function In biology, natural selection is the process of eliminating members of a species that do not adapt to the environment well. In genetic algorithm, the selection of genes is based on their tness that is the indicator showing how close the gene?s outputs are against the target. To quantitatively validate a simulation model, metrics have to be chosen and the corresponding values of these metrics with respect to the real system are computed. The values of metrics of the real system are the target. The tness of each gene is inversely proportional to the distance between its outputs and the target metrics, which is de ned by Equation 5.1: f(g) = 1qP6 i=1(xi ti)2 ; (5.1) where g is the gene. xi is the ith element of output metric vector given the gene. ti is the ith element of target metric vector. The metric vector includes six elements, each of which corresponds to a metric: the number of nodes, density, centrality, clustering coe cient, average path, and core/periphery ratio. As a general measure of the degree of socio-technical interaction, we use and interpret density, centrality, clustering coe cient, average path length and core/periphery ratio so as to identify the target networks including OBO and overlay map. Except the core/periphery ratio, the de nition of other metrics and their relation to creativity and innovation potential are presented in [113]. The core/periphery network pattern is considered as a stable, sus- tainable, and innovative structure [52]. Given the same number of core members, increasing level of periphery members is bene cial for bringing new external ideas. The core/periphery 63 ratio is used to measure the percentage of the members in the core to the members in the periphery. The algorithm shown in Figure 5.9 describes the strategy used to compute the core/periphery ratio. Figure 5.9: Core/Periphery Ratio 5.3.7 Termination Condition There are two conditions to terminate the validation process: 1. The maximum iteration times are reached. 2. A gene with tness greater than a prede ned threshold emerges. 64 5.3.8 Crossover Reproduction is the process of generating the next generation of genes, where two genetic operators are used, i.e., crossover and mutation. For crossover, there are two options: one- point crossover and two-point crossover. One-point crossover means that two genes exchange the parts beginning at the randomly selected cross point. Two-point crossover is de ned as that two genes exchange the part between the rst and the second cross point. 5.3.9 Mutation Crossover is a binary operator. On the other hand, mutation is a unary operator. There is a very small probability of mutation, i.e., 1%. Iterate all genes, if a randomly generated number is less than the mutation probability, then mutation happens. When mutation happens, our strategy randomly chooses a mutation point and then ips the bit. 5.3.10 Selection Selection is the process of updating population in terms of a tness-based function. The higher the tness of a gene is, the more likely the gene is selected. In the population including both parents and children, a xed number of genes are selected as the next generation. For each gene, the probability for it to be selected is based on its tness: ppi = iX j=1 pj; (5.2) pj = fjPN k=1fk ; (5.3) where pj is the probability for gene i to be selected. ppi is the accumulated probability. fj is the tness of gene j. Only if ppi rand(0;1) ppi 1, the gene i is selected. 65 5.3.11 Equilibrium During the evolution process, if the population does not change, then this is an indication that equilibrium is reached. The potential reason is that the population converges to a local optimal point. To break the equilibrium and continue the evolution, a mechanism named kick the ball is used. Vividly imagining the whole searching range as a mountain, kicking the ball aims to transfer a solution from one valley to another, where it may evolve to be a better solution. In the genetic algorithm, when the population of genes does not change, a part of population are randomly selected and their bits are randomly mutated. Compared with the mutation operator, kicking the ball ips multiple bits one time rather than just one bit. 5.3.12 Implementation The simulation model and the associated validation framework are implemented by RePast. RePast is an open source software that facilitates design and implementation of agent-based models. It provides mechanisms for both single and batch run. However, the con guration parameters cannot be changed during either single or batch run. So, a new run- ner that inherits the default runner of RePast is necessary to dynamically translate the gene to con guration parameters and return the outputs that are required for the computation of tness. As shown in Figure 5.10, the abstract GA class implements all other functions except the tness function. The tness function is overridden by child classes that drive the simulation model after converting the gene to corresponding con guration parameters. Figure 5.11 is the sequence diagram that captures the dynamic strategy used in the validation framework. The main class invokes the genetic algorithm which in turn invokes the simulation model that derives its tness value. Because the genetic algorithm may evolve over multiple generations, there is a loop for the genetic algorithm until the termination condition is reached. 66 Figure 5.10: Class Diagram of Validation Framework 5.4 Comparison with Overlay Map Overlay map [75] is a novel tool that presents relationships among disciplines based on citation data. After 100 generations, the best gene is discovered. Table 5.5 lists all the parameters and their values represented by the gene: Table 5.5: The Best Con guration against Overlay Map Name Value Carrying Capacity 50 Startup Funding 2 External Resource 1 Tolerance 0.6 Reorganization Tendency 0.1 Receptivity 1 Allocation Strategy Proportional to Contribution with Technology Transferring Communication Style Homophily Communication Frequency 0.6 Threshold to Grow 0.5 67 Figure 5.11: Sequence Diagram of Validation Framework Figure 5.13 is the snapshot of the network generated by the Colorscape model given the con guration parameters shown in Table 5.5. By comparison, the similarities of Figure 5.12 and Figure 5.13 can be observed as follows: 1. The development levels of communities ( elds) are di erent from each other. 2. Those communities with similar states form clusters. 3. Some communities have more links than others. The above is the intuitive comparison of network patterns. To gain more con dence, a quantitative comparison is undertaken in terms of six metrics: number of nodes, density, centrality, clustering coe cient, average path, and core/periphery ratio. Table 5.6 presents the comparison of the network metrics generated by the Colorscape model against the corre- sponding metrics of the overlay map (expected values in the table). Although the con dence intervals of metrics derived from the simulation data do not always contain the correspond- ing values of overlay map, we can still observe that they are signi cantly close. In addition, if we reduce the number of target metrics, the Colorscape model is able to generate outputs with con dence intervals containing the expected values. 68 Figure 5.12: Overlay Map [75] Table 5.6: Simulation Output vs. Overlay Map Name Mean Value Standard Deviation Con dence Interval of 90% Expected Values Number of Nodes 296 79.581 [272, 320] 222 Density 0.214 0.068 [0.194, 0.233] 0.139 Clustering Coe cient 0.574 0.050 [0.559, 0.589] 0.648 Centrality 0.354 0.040 [0.342, 0.366] 0.216 Average Path 1.847 0.086 [1.821, 1.873] 2.415 Core/Periphery Ratio 77.283 36.583 [65.482, 89.084] 73.0 5.5 Comparison with the OBO Domain-Domain Data Figure 5.14 depicts the actual OBO network, where the nodes with the same color belong to the same group. Because all the groups are the branches of biology, we consider them as a single domain. After 100 generations, most t gene is found. Table 5.7 lists all the parameters and their values represented by the gene. 69 Figure 5.13: Snapshot of the Colorscape Model against Overlay Map Table 5.7: The Best Con guration against OBO Data Name Value Carrying Capacity 30 Startup Funding 2 External Resource 1 Tolerance 0.6 Reorganization Tendency 0.5 Receptivity 0.9 Allocation Strategy Uniform allocation with technology transferring Communication Style Homophily Communication Frequency 1.0 Threshold to Grow 0.5 Because the Colorscape model studies the relationships between communities with sim- ilar or di erent domains, all the communities have to be categorized into domains based on their color so as to compare to the OBO network. To illustrate the process, let us observe Figure 5.15 that is a snapshot given the above con guration parameters. 70 Figure 5.14: OBO Domain-Domain Network For Figure 5.15, communities can be divided into four domains based on their similarities in terms of their colors, which are shown in Figure 5.16. We compute the metrics for each of these domains and compare the metrics with the metrics of OBO network. Table 5.8 presents the comparison of network metrics generated by the Colorscape model against the corresponding metrics from empirical OBO data (expected values in the table). Since the con dence intervals of metrics derived from the simulation data contain the cor- responding values of the OBO network, we conclude that Colorscape model can generate similar network structures in comparison to OBO. In addition, the best con guration parameters against the OBO network are recorded in Table 5.7. From the table, we can observe that the best con guration has a medium level tolerance (0.6), high receptivity (0.9), and high degree of communication frequency (1.0). These are indeed the quintessence characteristics of open source science communities. 71 Figure 5.15: Snapshot of Colorscape Model against OBO Table 5.8: Simulation vs. OBO Data Metrics Mean Value Con dence Interval of 90% Expected Values Number of Nodes 55.633 [46.076, 65.190] 49 Density 0.605 [0.521, 0.689] 0.549 Clustering Coe cient 0.846 [0.812, 0.881] 0.880 Centrality 0.355 [0.302, 0.407] 0.405 Average Path 1.404 [1.317, 1.491] 1.406 Core/Periphery Ratio 50.9 [37.2, 64.6] 23.5 5.6 Power Law When the probability for the occurrence of an event is inversely proportional to its size, power-laws are often expected [66]. Power law appears in many systems, e.g., the distributions of the sizes of cities, earthquakes, forest res, and people?s personal fortunes. Figure 5.17(a) shows the inequality of communities in terms of resources. Most commu- nities hold the relatively few resources, while a small part of communities hold the relatively many resources. To determine if this can be interpreted by the power law, Figure 5.17(b) shows the relationship between the Logarithm value of number of communities and their 72 Figure 5.16: Clusters of the Network of Colorscape Model against OBO resources, as well as the corresponding linear regression curve. Since the R2 for this tting is 0.86, there is signi cant evidence that the Colorscape model can exhibit power-law in resource distribution. (a) (b) Figure 5.17: Distribution of Resources in ColorScape Model 73 Chapter 6 Simulation Results and Evaluation In this chapter, experiments are conducted to investigate the impact of scienti c com- munity traits (i.e., receptivity, exibility, reorganization tendency) and environmental con- straints (i.e., interaction topologies, carrying capacity, resource allocation strategies) on the innovation performance (e.g., diversity and resilience) of GPS. 6.1 Interaction Toplogies The experiments in this section test ve types of interaction topologies and their e ects on diversity and resilience of GPS: 1. One-dimensional grid: Each community has two neighbors on the left and right side. 2. Two-dimensional grid: Each community is embedded in a Von Neumann neighbor- hood; that is, it has eight neighbors surrounding it. Figure 3.1 includes a snapshot of the 2D grid. 3. Random network: The edges between any pair of nodes are created with equal proba- bility. 4. Random group network: The nodes within a group have higher probability to build links than those between di erent groups. 5. Scale-free network: The nodes with more links are more likely to be selected to build links. Figure 3.1 includes a snapshot of scale-free network. 6. Dynamic network: Communities choose to communicate with other communities based on preferences dictated by the selected social communication theories. 74 6.2 Measuring Innovation Potential and Performance Since we are interested in observing potential relations between the structure of the social network and innovation outputs of a community, two types of metrics are considered: innovation metrics and network structure metrics that pertain to integrated di erentiation. We proposed a hierarchy of metrics to examine relations between scienti c community traits, structure of communication networks, and innovation performance. The hierarchy shown in Figure 6.1 aims to delineate the relationship between layers of the evaluation framework. 6.2.1 Innovation Metrics In the evaluation framework shown in Figure 6.1, there are three innovation metrics: robustness, resilience, and interdisciplinarity. Two of these metrics are used in the following experiments: resilience and interdisciplinarity, in which diversity is suggested to be a useful proxy indicator to measure interdisciplinarity [74] [71]. 6.2.1.1 Diversity The process of knowledge creation is based on the combination and elaboration of exist- ing knowledge. Diverse sources of knowledge challenge existing solutions, ignite new ideas, and lead to more impactful solutions [60]. So, diversity is a proxy indicator for innovation potential and capacity. There are three dimensions related to diversity: variety, balance, and disparity [89]. Variety can be computed as the number of clusters of communities of the whole environment. Each cluster is composed of similar communities. To classify com- munities into clusters, we use the QT (Quality Threshold) clustering algorithm [39]. QT clustering algorithm needs a prede ned diameter indicating the maximum di erence among members in a cluster. Then a candidate cluster for each community is built by including other communities within the prede ned diameter. A cluster with maximum members is selected, and then we recursively run the above steps with the set of communities after 75 Figure 6.1: The Evaluation Framework removing communities in the selected cluster. Algorithm 1 is the pseudo code for the QT algorithm: Balance indicates inequality in terms of resources each community holds. It is calculated using the Gini coe cient [16], which is a measure of the inequality of a distribution, a value of 0 expressing total equality, and a value of 1 maximal inequality [100]. The Gini coe cient is calculated as follows: 76 Algorithm 1 QTAlgorithm (Community[] communities; double diameter) Vector> result = new Vector>(); Community[][] clusterArray = new Community[communities.length][]; for i = 0 to communities:length do /*Find cluster for each community*/ Community[] cluster = ndCluster(communities, communities[i], diameter); clusterArray[i] = cluster; end for int indexMax = ndMaxCluster(clusterArray); result.addAll(clusterArray[indexMax]); removeCommunities(communities, clusterArray[indexMax]); if communities:length> 0 then /*Recursively call the algorithm with the reduced set*/ Vector> tmpResult = QTAlgorithm(communities, diameter); result.addAll(tmpResult); end if return result GN = Pn i=1(2i n 1)xi (n 1)Pni=1xi ; (6.1) where n is the total number of communities. xi is the resource level of community i. Disparity refers to the degree of di erence of each community, that is, the dissimilarity of communities based on their current color. 6.2.1.2 Resilience Partly, innovation is the process of nding alternative, more e ective ways to address challenges and seize opportunities. On the other hand, resilience is the capacity to adapt, restore in constructive ways while undergoing changes to retain essentially the same function. Hence, innovation is change, but resilience is survival. Due to presence of uncertainty in the evolution of the innovation landscape, resilience is an essential property for a scienti c community to sustain its innovation capacity. Resilience is the capacity of a system to absorb disturbance and reorganize while under- going changes to still retain essentially the same function, structure, identity, and feedbacks 77 [94]. Based on this de nition, we de ne resilience as the extent of disturbance of the system that reduces the fraction of active communities to the initial set of communities below a speci c threshold. 6.2.2 Network Metrics Structural properties of networks as they relate to creative output pertain to integrated di erentiation [87]. As a general measure of the degree of social interaction, we use density, centrality, and clustering coe cient to determine their potential roles in and relation to innovativeness. Low density and high centrality communities are expected to exhibit higher degrees of innovation capacity [25]. Cliquish networks with low average path lengths are known to be e ective in knowledge creation and di usion [23]. 6.3 Simulation Results Using the ColorScape model, we conducted a series of exploratory experiments to ex- amine how innovation capacity and sustainability of the innovation ecosystem relate to com- munity interaction topologies, connectivity, and resource allocation strategies. Table 3.1 denotes the con guration parameters and their initial values. 6.3.1 Diversity vs. Carrying Capacity In this experiment, we explore variation of diversity in relation to number of commu- nities within a speci c topology. Figure 6.2 evaluates variety, disparity and balance across combination of two factors, number of communities and 1D/2D topology. In Figure 6.2, we observe that variety and disparity increase with the initial community size, called Carrying Capacity (CC). In the 2D topology, disparity increases with CC up to a critical threshold, after which further increase in dissimilarity diminishes. Computation of variety is based on the QT clustering algorithm based on a pre-selected diameter denoting the maximum di erence allowed among members within a cluster. In this experiment, the 78 (a) (b) (c) (d) Figure 6.2: Diversity vs. Initial Community Numbers diameter is set to 10, indicating that the hue di erence among communities within a cluster can be up to 10. Therefore, the maximum variety is 360/10 = 36. That is, diversity cannot increase inde nitely with CC. Based on Figure 6.2(d), the comparison between 1D and 2D suggests that in comparison to 1D topology, the 2D topology is more conducive to fostering variety with a lower degree of uncertainty. Also, the limited sphere of interaction exhibited in the 1D topology inhibits di usion of in uence and hence leads to increased time to reach equilibrium. Next, to evaluate the impact of neighbor size and hence the sphere of in uence within the 1D topology, we gradually increased the interaction window from 2 to 8 neighbors. Obser- vations depicted in Figure 6.3 suggest that interaction window positively a ects variety and underlying uncertainty (i.e., dispersion) up to a level, beyond which variety stops improving while uncertainty increases. 79 Figure 6.3: Variety vs. Neighbor Size in 1D 6.3.2 Diversity vs. External Resource The resource allocation strategy used in the baseline model is to distribute all resources uniformly among communities. The total available resource is the sum of contributions of communities and external resources. Figure 6.4 depicts the change in diversity with respect to available external resources. Figure 6.4: Diversity vs. Resource Allocated Per Time The abscissa indicates the amount of resources allocated to each community per time tick. In the 1D topology, the rate of increase in variety slows and stabilizes over time. On 80 the other hand, the 2D topology seems to be less sensitive to external resource, indicating higher degree of potential for resilience than 1D. When external resource is low, a small number of communities can survive. As exter- nal resource increases, more and more communities can survive, which leads to increased diversity. This trend increases up to a point beyond which more resources only can increase the number of communities within a cluster rather than the number of clusters. On the other hand, the communities in the 2D topology have more neighbors than those in the 1D topology, which makes communities more likely to form clusters. Communities within a cluster have similar domains, which helps communities improve maturity with less resource consumption during the process of learning discussed in section 3.5. Thus, communities in the 2D topology have higher maturity and more resources left, so that the second part in Equation 3.1 is large enough to sustain all communities. For policy makers, it is noteworthy that more funds cannot lead to higher diversity. More funds only result in more resources held by communities. 6.3.3 Diversity vs. Reorganization The experiment in this section aims to nd out the relationship between diversity and reorganization. Figure 6.5 depicts change in variety, disparity, and balance against di erent levels of reorganization tendency. From Figure 6.5, we can observe that variety and disparity decrease with increasing re- organization tendency, which means that reorganization has negative e ects on variety and disparity. On the other hand, specialization has positive e ects on variety and disparity. It is consistent with the functionality of specialization and reorganization. Specialization facili- tates creation of a new community with a di erent target color from the current community. However, reorganization involves pulling the target color toward the current color, causing convergence. 81 Figure 6.5: Diversity vs. Reorganization Tendency 6.3.4 Diversity vs. Receptivity In this experiment, we considered alternative interaction topologies (Random and Ran- dom Group Network) to discern the relation between variety and community receptivity. Receptivity of a community is de ned as the ratio of neighbor in uence to inertia. Connect- edness is de ned as the probability of building links between nodes. Figure 6.6 indicates that there is a critical receptivity threshold, after which the behavior of low and high density communities diverges. Behind this phenomenon, the potential reason is that low receptivity results in few in uences from neighbors, which in turn determines context topologies? few e ects on variety. Under environments with high receptivity, variety favors low connectivity. The reason is that more communication links cause convergence, which in turn decrease the variety. However, communities with various levels of connectivity converge to the same stable level of variety. Similar patterns are observed in both random and random group networks. Based on the experimental results, policy makers may encourage communities to be more receptive in a relatively low density environment to reach a high variety. This conclusion is supported by earlier reports and ndings [25]. 82 Figure 6.6: Variety in Random and Random Group Network 6.3.5 Resilience of Di erent Network Topologies Resilience is de ned as the extent of disturbance on the system that signi cantly reduces the ratio of active communities to CC when external resource is set to maximum [94]. To compute resilience, the number of communities under maximum resource availability (i.e., CC) is set as the base reference level for each topology. Figure 6.7 depicts the number of active communities varying along with external resources in terms of three kinds of network topologies. Figure 6.7: Number of Active Communities As resources are gradually reduced, the ratio ( ) of number of communities to the CC is computed. The loss ratio is de ned as 1 and ranked to identify resilient topologies. According to Table 6.1, scale-free network has the highest resilience, and random group network has higher resilience than random network, because the loss ratio of scale free network is smallest and the loss ratio of random network is largest when external resources 83 decrease to 0.7. Figure 6.8 con rms that random group network exhibits higher resilience than random network. Table 6.1: Resilience of Di erent Network Topologies Random Random Group Scale Free ResourcesNumber of Communi- ties Loss Ratio Number of Communi- ties Loss Ratio Number of Communi- ties Loss Ratio 1 43.77 0 35.57 0 71.43 0 0.9 42.3 0.03 34.27 0.04 66.47 0.07 0.8 37.43 0.14 30.63 0.14 60.7 0.15 0.7 25.33 0.42 22.83 0.36 52.2 0.27 Figure 6.8: Comparison of Random and Random Group Network on Resilience 6.3.6 Relationship between Diversity and Network Metrics The data to study the relationship between diversity and network metrics are gathered from previous experiments involving sensitivity analysis on receptivity. Each pair of density and variety is classi ed into buckets that occupy an identical range i.e., 0.1 for each bucket in terms of density. If the density falls into the range of [0, 0.1), then the pair of density and variety belongs to the bucket of 0.1. If the density falls into the range of [0.1, 0.2), then 84 the pair of density and variety belongs to the bucket of 0.2. After grouping, variety is the average of all pairs in the corresponding bucket. Figure 6.9 shows that variety increases with density up to a point. After that point, variety decreases with increasing density in both random and random group networks. Figure 6.9: Variety vs. Density in Random and Random Group Network Figure 6.10 plots variety against degree centrality. Variety increases with centrality up to a point. Beyond that point, variety decreases with increasing centrality. Figure 6.10: Variety vs. Centrality in Random and Random Group Network In [40], Hohn examines the relationship between species diversity and population density in diatom populations, which is shown in Figure 6.11. Since scienti c communities can be viewed as an ecosystem, it is reasonable to compare the phenomena of ecosystem to that of scienti c communities. From this gure, we can see species diversity increasing with population density up to a point. As density increases beyond this threshold, diversity starts declining. The density in [40] is de ned as the number of individuals per species, which is di erent from density de ned in our research. However, both de nitions of density are 85 related. The more individuals the species has, the more is the dependency among members due to shared, but limited resources. Figure 6.11: Species Diversity vs. Population Density in [40] In [69], the following proposition about centrality and creativity is presented: individuals with greater centrality are likely to have higher creativity until a level. Beyond this level, greater centrality may constrain creativity. This trend is consistent with our experimental results. 6.3.7 Sustainability, Resource Availability, and Connectedness In ecology, sustainability refers to the ability of biological systems to remain diverse and productive over time. In the domain of creativity, sustainability can be interpreted as the e ectiveness of communities in utilizing resources. So, we relate it to success rate, which measures the extent to which communities are e ective in making use of resources to improve their maturity, while maintaining themselves. Success rate is de ned as the ratio of the number of active communities remaining at the end of simulation to CC. Figure 6.12 depicts the relationship between resource availability, interconnectedness, and success rate. The experimental results suggest that if resource availability increases while connectedness is decreased, the success rate increases. Also, when resource is at high level, success rate decreases with increasing connectedness. In addition, when resource is at low level, success rate decreases with decreasing connectedness. A plausible explanation for this observation is that higher resource availability leads to higher variety. Under high 86 variety, larger connectivity causes each community to be pulled toward multiple di erent cognitive niches, resulting in lack of focus which in turn costs communities more resources, and hence decreasing the survival rate. On the other hand, lower resource availability leads to lower variety. Under low variety, however, strong connectivity results in more communities sharing similar states, bene ting from each other through a symbiotic relation, which in turn increases the overall survival rate. Figure 6.12: Success Rate vs. Resource Based on these preliminary observations, policy-makers may encourage communities to build highly connected clusters if resource availability is low. On the other hand, under moderate to high-level resource availability, loosely connected clusters may be more e ective in promoting an environment conducive to sustainability. 6.3.8 Disparity vs. Resource and Connectedness Creativity partly involves combination and elaboration of existing knowledge. Therefore, we use diversity as a proxy indicator for collective creativity. As discussed earlier, there are three dimensions related to diversity: variety, balance, and disparity [89]. In this section, we focus on the disparity dimension. Disparity indicates the degree of inequality, which can be measured by the Gini coe cient [100]. The coe cient ranges from 0 to 1, where 0 and 1 refer to perfect equality and extreme inequality, respectively. 87 Figure 6.13 denotes the relationship between resource availability, connectedness, and disparity. As resource level increases from 0.5 to 1, disparity increases with resource avail- ability when the degree of connectedness is low, since more resources lead to higher success rate, which in turn results in disparity. On the other hand, when the resource level increases further, disparity decreases. This is due to decreased need for interaction for sustainment. This, in turn, decreases inequality. In addition, disparity increases with decreasing level of connectivity, which is possibly due to increased convergence under high connectivity, result- ing in decreased disparity. Figure 6.13: Disparity vs. Resource Based on the previous two experiments, Table 6.2 summarizes how disparity and success rate relate to resource availability and connectedness. Table 6.2: Success Rate and Disparity Resource Level Resource Trend Connectedness Disparity Success Rate Low Down Down Up Down High Up Down Up Up High Up Up Down Down 88 6.4 Experiments on Resource Allocation Strategy Understanding the in uence of resource distribution across communities is critically important for informed decision-making in science and innovation policy development. The following experiments focus on the impact of resource allocation strategies on diversity. For the allocation strategies, we identify seven options: 1. Allocate resource uniformly among communities. 2. Allocate resources proportional to the contributions of communities. 3. Allocate resources proportional to the size of cluster formed by similar communities. 4. Allocate resources proportional to the importance of domains. 5. Fully competitive allocation. 6. Peer-to-peer (P2P) lending. 7. Random allocation. In uniform allocation, resources are allocated to communities equally regardless of their states. Resource allocation proportional to contribution is a reward mechanism. Commu- nities with larger contributions receive more resources. Under the allocation proportional to cluster size, the larger the cluster a community belongs to, the more resources the com- munity receives. With allocation proportional to importance of domains, disciplines with higher priority receive more resources. The competitive allocation strategy is analogous to the prey/predator model, where resources are distributed among domains, and communities compete for resources. Under random allocation, resources are allocated to a randomly se- lected set of communities equally regardless of their state. P2P lending involves a contract- bid protocol. The community that invites others for collaboration is called the sponsor. Other communities in the same domain respond with a bid that indicates the ratio of re- sources the community gets to those resources the sponsor will receive. The community that 89 answers the call is named as respondent. After receiving all the bids, the sponsor selects a bid with the highest resource gain. Figure 6.14 presents the class diagram of the resource allocation module, where all classes are inherited from a single class named ResourceAllocation that declares two functions implemented by sub-classes. All classes of di erent allocation strategies only have one single public function i.e., allocationResources(). In addition, the common part of total resources is extracted to be a class named TotalResource that has two public functions i.e., xed() and techTransfer(), which are distinguished by whether or not there is a mechanism to transfer technology. 6.4.1 Design of Resources Allocation Strategies 6.4.1.1 Uniform Allocation Uniform allocation means that resources are allocated to communities equally regardless with the states of communities. Figure 6.15 represents the process of uniform allocation. Given the total resource (RT) and total number of communities (N), each community can receive resources (Ri) that amounts to: Ri = RT 1N: (6.2) 6.4.1.2 Proportional to Contribution Resource allocation proportional to contribution is a reward mechanism in that commu- nities with larger contributions receive proportionally more resources. Figure 6.16 represents the process of resource allocation proportional to contributions of communities. In Figure 6.16, contributions provided by a community (Ci) are moderated by the product of its maturity and resource. This is based on the hypothesis that communities with 90 Figure 6.14: Class Diagram of Resources Allocation 91 Figure 6.15: Flow Chart of Uniform Allocation higher maturity and resources are expected to be more productive. Each community can receive resources (Ri) that amounts to: Ri = RT CiPN j=1Cj ; (6.3) where RT is the total available resources. N is the total number of communities. 6.4.1.3 Proportional to Cluster Size Allocation of resources proportional to cluster size refers to distribution of resources proportional to size of the cluster to which a community belongs. The purpose of such allocation strategy is to encourage communities to form larger clusters. Figure 6.17 represents the process involved in deciding how much resource to allocate to a community. Given a total resource (RT), each community gets resources with amount of: 92 Figure 6.16: Flow Chart of Allocation Proportional to Contribution Ri = RT SiPN j=1Sj ; (6.4) where Si is the size of cluster community i belongs to, N is the total number of com- munities. For the sake of illustration, here is an example where there are three communities, among which two communities form a cluster and RT = 10. The community within the cluster gets the resource of 10*2/(2+2+1) = 4. Also, the community without the cluster gets the resource of 10/(2+2+1) = 2. So, the community within the cluster gets resources twice more than the community without a cluster. 93 Figure 6.17: Flow Chart of Allocation Proportional to Cluster 6.4.1.4 Proportional to Importance of Domains Some domains have higher funds than others, e.g., nanotechnology receives more funds than other conventional physics. Figure 6.18 represents the process of allocation proportional to importance of domains. Each community i receives resources (Ri) de ned as follows: Ri = RT WjN j ; (6.5) 94 where RT is the total resource. j indicates the domain community i belongs to. Wj and Nj denotes the importance of domain j and total number of communities domain j includes, respectively. Figure 6.18: Flow Chart of Allocation Proportional to Importance of Domains For the sake of illustration, the following is an example where the whole range of hue is divided into three domains, that is, [-60, 60), [60,180), and [180, 300), whose corresponding importance is 0.6, 0.3, and 0.1 respectively. If the number of communities in the domain [-60, 60) is 3 and total resource is 100, each community in the domain [-60, 60) can be allocated resources of 100*0.6/3 = 20. 95 6.4.1.5 Competitive Allocation The competitive allocation strategy is similar to the prey/predator model, where re- sources are distributed among domains, and communities receive resources from the domain they belong to. The whole range of hue is divided into 360 domains. Each domain has a xed amount of resources at the beginning of each time interval. If a community attains the resources within its domain, the resources in that domain become 0. The process is represented in Figure 6.19. Figure 6.19: Flow Chart of Competitive Allocation 6.4.1.6 P2P Lending P2P lending introduces the mechanism of calling for proposals on the basis of fully competitive allocation, indicating that a community can request collaboration with other 96 communities if the domain of the community does not have su cient resources. The com- munity calling others for collaboration is called sponsor. When P2P lending occurs, other communities in that domain respond with a bid that indicates the ratio of resources received by the community to those resources the sponsor receives. The community that answers the call is named as respondent. The bid a respondent submits is proportional to resources it holds, that is, the more resources it has, the higher the ratio of pro t the community expects to receive. After receiving all the bids, the sponsor community selects a bid with the highest pro t. The process is represented in Figure 6.20. 6.4.1.7 Random Allocation Random allocation involves distributing resources to communities randomly regardless of their state. Figure 6.21 represents the process of random allocation. 6.4.2 Network Pattern vs. Resource Allocation Strategy In this section, emerging network patterns are qualitatively and visually examined to gain insight about the impacts of resource allocation strategies on diversity. Figure 6.22 depicts network structures generated under allocation strategies 1 (i.e., uniform allocation), 2 (i.e., proportional to contributions) and 3 (i.e., proportional to the size of clusters). We observe that the network under uniform allocation has the highest diversity, while strategies 2 and 3 lead to relatively lower diversity. In Figure 6.23, the prede ned ratios of resources allocated to disciplines indicated by red, green, and blue colors are 60%, 30%, and 10%, respectively. We observe that two types of network patterns emerge under allocation proportional to signi cance of domains. Figure 6.23(a) depicts that the number of communities in each domain is proportional to their importance. However, the communities with most resources granted may not be as successful as expected, as exhibited in Figure 6.23(b) by relatively small number of red communities. A potential reason is that the cluster of red domains interacts in high frequency with the 97 Figure 6.20: Flow Chart of P2P Lending cluster of green and blue domains. This interaction could have incurred signi cant resource cost during the learning process, resulting in decreased number of red domains. To explore the impact of the relation between domains on the nal network pattern, we design an experiment where the blue communities and the green communities cannot 98 Figure 6.21: Flow Chart of Random Allocation (a) Strategy 1 (b) Strategy 2 (c) Strategy 3 Figure 6.22: Strategy 1 vs. Strategy 2 vs. Strategy 3 connect with each other so that the red communities are aligned between the green and the blue communities. The simulation model is run thirty times with di erent random seeds, and the nal network patterns can be categorized into two types, which are shown in Figure 6.24. In Figure 6.24(a), all red communities are changed to either blue or green communities. Because blue communities and green communities cannot connect with each other, the nal pattern of type 1 is an isolated cluster of blue or green communities. In Figure 6.24(b), 99 (a) Expected Case (b) Unexpected Case Figure 6.23: Allocation Proportional to Importance of Domains red communities are pulled by both green and blue communities so that more resources are consumed and their maturity is developed slowly. In addition, because the current model is based on Homophily theory, the intensity of in uences from peers is proportional to their similarity rather than how many resources the community holds. Therefore, red communities cannot thrive under such network alignment, although most resources are allocated to the red communities. (a) Isolated clusters of blue or green communities (b) Red communities struggling to thrive Figure 6.24: Patterns in Network Con guration Experiment Figure 6.25 depicts the comparison of competitive allocation and P2P lending allocation. We do not observe signi cant di erence on diversity, which suggests that higher success rate, as observed in P2P lending, may not result in signi cantly higher diversity. 100 (a) Competitive Allocation (b) P2P Lending Figure 6.25: Competitive Allocation vs. P2P Lending 6.4.3 Variety vs. Resource Allocation Strategy For the resource allocation strategy, two aspects are considered. One is the resource size, and the other is the allocation mechanism, which is de ned in terms of the seven categories listed in the previous section. For the resource size, two options are examined: xed amount of total resources and dynamic allocation with technology transferring. As shown in Table 6.3, we examine twelve allocation strategies based on the combination of resource size and allocation strategies (except competitive and P2P lending). The experiments with each allocation strategy are conducted 30 times and the aver- age variety is recorded as shown in Figure 6.26. Variety can be computed as the number of clusters of communities within the environment. Each cluster is composed of similar communities in terms of their hue. Based on Figure 6.26, we discern the following: 1. Key area investment with technology transferring (A8) results in the highest variety. This is similar to the case where domains with lower priority still have potential to advance, yet the environment promotes development of domains related to priorities. 2. Uniform allocation (A1, A2) leads to higher variety compared to resource allocation proportional to contributions (A3, A4). 101 Table 6.3: Allocation Strategies Symbol Resource Allocation Strategies A1 Uniform allocation with xed ex- ternal resource A2 Uniform allocation with technol- ogy transferring A3 Allocation proportional to con- tribution with xed external re- source A4 Allocation proportional to contri- bution with technology transfer- ring A5 Allocation proportional to cluster size with xed external resource A6 Allocation proportional to cluster size with tech transferring A7 Allocation proportional to impor- tance of domains with xed exter- nal resource A8 Allocation proportional to impor- tance of domains with technology transferring A9 Competition allocation A10 P2PAllocation A11 Random allocation with xed ex- ternal resource A12 Random allocation with technol- ogy transferring 3. Competitive allocation (A9) results in higher variety than P2P lending (A10). The underlying reason is that P2P lending allows communities to share resources, which in turn causes both lender and borrower communities to fade out under limited resources. 102 Figure 6.26: Variety vs. Allocation Strategies 103 Chapter 7 Comparison of Communication Theories in Terms of Innovation Performance 7.1 Introduction Under the globalization driven by advances in computer and communication technology, the ow of information that transmits through communication networks is independent of space and time, because people can share knowledge and make contributions simultaneously anywhere in the world [61]. Furthermore, the mechanisms for the emergence and evolution of communication networks can be abstracted into several communication theories. Although communication theories describe the internal mechanisms of social communication networks, little research has been conducted to implement computational models using them. Mean- while, there is no research undertaken for comparison of communication theories in term of their e ects on innovation performance. Communication networks and the organizational forms of the 21st century are undergo- ing rapid and dramatic changes [32]. There exist theories that focus on the role of interaction mechanisms in explaining the emergence and evolution of communication networks. One ad- vantage of analyzing system dynamics from the perspective of socio-technical networks is the ability of data analysis at various levels such as individual, dyad, triad, organizational, and interorganizational [61]. Homophily, preferential attachment, and exchange theory are mainly about the dyad relationship where a communication tie from community A to com- munity B can be predicated by the communication tie from community B to community A. On the other hand, balance and structural hole theory analyze the triad relationship, where the communication tie between community A and B can be predicated by the third community C that is associated with both A and B. In addition, these theories distinguish with each other in terms of studying internal mechanisms of communication networks from 104 di erent perspectives, which include communities? traits, self-interest, and discrepancy in re- sources. Homophily, preferential attachment, structural hole, exchange, and balance theory analyze communication network at di erent levels and from di erent perspectives, hence it is important to use them to model the dynamics between communities and compare them in terms of network patterns and innovation metrics. 7.2 Homophily Homophily theory explores the emergence of communication networks based on the similarities of network members? traits [61]. Similarity contributes to ease communication, foster trust and increase the predictability of behavior [15]. On the basis of homophily, communities select others who are similar to communicate. 7.2.1 Model Design The following process is from the viewpoint of a community called the current commu- nity. At each time interval, the current community randomly selects another community to communicate based on their similarities, which means that the higher similarity between the current community and the target community results in the higher probability of building communication between them. Figure 7.1 depicts the process of communication guided by the Homophily theory. The following equations describe how to update in uences of neighbors based on ho- mophily. 8> >< >>: Wji;t = Wji;t 1 +CW Iji;t (1 Wji;t 1) if Iji;t 0 Wji;t = Wji;t 1 +CW Iji;t Wji;t 1 otherwise (7.1) where Wji;t is the in uence of neighbor j at the current time. CW is a number between 0 and 1 and is inversely proportional to inertia (resistance to change in a community). Iji;t is the intensity of change in the in uence, which is de ned as: 105 Figure 7.1: Process of Communication using Homophily Theory Iji;t = (1 Dji;t)4 (1 Di;t)4; (7.2) where Dji;t is the dissimilarity which is equal to the distance between community i and community j in terms of current hue at the time t whose equation is 3.8. Di;t is the average distance between community i and all of the neighbors at the time t. This function grows much faster when dissimilarity between i and j becomes smaller in comparison to average dissimilarity, resulting in higher intensity Iji;t. The equation for the dissimilarity between community i and j is de ned as follows: Dji;t = Dissimilarity(Hi;t;Hj;t); (7.3) 106 where Hi;t is the hue of community i at the time tick t. Hj;t is the hue of community j at the time j. Dissimilarity(x;y) = 8> >< >>: jx yj 180 ifjx yj 180 360 jx yj 180 otherwise (7.4) 7.2.2 Validation We designed an experiment where there are three communities A, B and C. The sim- ilarity between community A and B is 80%, while the similarity between A and C is 20%. The experiment tries to see how likely community A would like to communicate with B or C under homophily theory. The simulation model runs 100 times, among which community A communicates with community B 97 times as shown in Figure 7.2. To amplify the di erence of similarity, a square operation is used, which in turn leads community B to have a much higher probability of being communicated by community A than community C. Figure 7.2: Communication Frequency vs. Similarity 107 7.3 Structural Hole Structural holes are those places where communities are not connected so that other communities may exploit the places by investing their social capital to indirectly link two or more unconnected communities [61]. The community that lls the structural hole becomes a broker in relationships among others. As shown in an early Italian saying "between two ghters, the third bene ts" [17], the community acting as broker can bene t from di erent knowledge and expertise of other communities. There are two kinds of information bene ts for broker identi ed in [19]: access and timing. Access means getting information that others may not get. Timing refers to getting information earlier than peers. 7.3.1 Model Design The following process is from the viewpoint of a community denoted as C0. At each time interval, a community C1 not connected to C0 is randomly selected rstly. Then a community C2 not connected to C1 is randomly selected. Finally two links between C0 and C1, C0 and C2 are built respectively. The process is depicted in Figure 7.3. Figure 7.3: Process of Communication using Structural Hole Theory 108 7.3.2 Validation Burt in [17] points out that individual?s e ective network size determines its success potential; that is, individual with larger e ective network is more likely to succeed. The ties among a person?s network partners attenuate the e ective network size, which gets to the max value (i.e., 1) when partners are not connected to one another. On the other hand, the e ective network size becomes the min value (i.e., 0) if partners are connected to one another. Meanwhile, clustering coe cient measures how close are the neighbors to being a clique. However, clustering coe cient is equal to 1 if neighbors are fully connected. So, we use 1 - clustering coe cient to represent network e ective size to get 0 for fully connected networks and 1 for isolated networks. In addition, communities? success is re ected by their resources. The following experiment is to capture the relation between e ective network size and resources, where the e ective network size and resources of each community are recorded at each time step. Then, the average value of resources is computed with respect to the same e ective network size. Based on Figure 7.4, we can observe that resources held by communities increase with communities? e ective network size. The potential reason is that larger e ective network size means more opportunities around the community, and hence increase in its resource levels. Figure 7.4: Resource vs. E ective Network Size under Structural Hole Theory 109 7.4 Preferential Attachment The preferential attachment is a process where resources are distributed among individ- uals according to how much they already have, i.e., rich get richer. Communities may like to connect to others with more resources in order to steer their own directions to get potentially more resources. On the other hand, communities may intend to connect to peers with larger number of links that indicates a more central position and larger in uences within the net- work. So, there are two branches of preferential attachment theory: preferential attachment based on resources, and preferential attachment based on links. 7.4.1 Preferential Attachment Based on Resources The preferential attachment based on resources means that the community with more resources are more likely to be communicated. Figure 7.5 shows the process of building connections under the preferential attachment based on resources. Figure 7.5: Communication Process of Preferential Attachment based on Resources 110 Under the preferential attachment based on resources theory, communities with more resources have larger in uences on others. The following equation describes how to update in uences of neighbors using the preferential attachment based on resources. Wji;t = Rj;tPN k=1Rk;t ; (7.5) where Wji;t is the in uence of neighbor j on community i at time t. Rj;t is the resources of community j. N is the total number of communities. 7.4.2 Preferential Attachment Based on Links The preferential attachment based on links means that the community with more links are more likely to be communicated. Figure 7.6 delineates the process of using preferential attachment based on links to build connections among communities. Figure 7.6: Communication Process of Preferential Attachment based on Links 111 Under the preferential attachment based on links theory, communities with more links have larger in uences on others. The following equation describes how to update in uences of neighbors using the preferential attachment based on links. Wji;t = Lj;tPN k=1Lk;t ; (7.6) where Wji;t is the in uence of neighbor j on community i at time t. Lj;t is the number of links of community j. N is the total number of communities. 7.4.3 Validation Under suitable circumstance, preferential attachment can generate power law [106]. For preferential attachment based on resources introduced in section 7.4.1, we run the simula- tion 30 times and output the resources of communities at the end of each run. Figure 7.7(a) depicts the inequality of communities in terms of resources. Most communities hold the rel- atively few resources, while a small part of communities hold the relatively many resources. This observation is indicative of the presence of power law. Figure 7.7(b) shows the rela- tionship between the log value of number of communities and their resources, as well as the corresponding linear regression curve. Since the R2 for this tting is 0.92, the Colorscape model suggests the presence of power law in resource distribution. For preferential attachment based on links introduced in section 7.4.2, we run the sim- ulation 30 times and print out the number of links of communities at the end of each run. Figure 7.8(a) depicts the inequality of communities in terms of links. Most communities have the relatively few links, while a small part of communities hold relatively many links. Because the Colorscape model has a mechanism of specialization where a new community is generated and a link between original and new community is built, the creation of this link does not follow preferential attachment theory, which in turn results in the communities with two links are more than those with one link. This observation is indicative of the pres- ence of power law. Figure 7.8(b) shows the relationship between the log value of number of 112 (a) Histogram of Communities? Resources (b) Linear Regression of Logarithmic Value of Re- sources Figure 7.7: Communities? Resources communities and their links, as well as the corresponding linear regression curve. Since the R2 for this tting is 0.85, the Colorscape model suggests the presence of power law in link distribution. (a) Histogram of Communities? Links (b) Linear Regression of Logarithmic Value of Links Figure 7.8: Communities? Links 113 7.5 Balance Theory 7.5.1 Model Design Heider?s balance theory [38] states: \my friend?s friend is my friend; my friend?s enemy is my enemy; my enemy?s friend is my enemy; my enemy?s enemy is my friend", which means friends have similar attitudes, while enemies have di erent opinions on the third object. As scienti c communities keep creative, they may desire to communicate with neighbors including both similar and dissimilar communities in order to maintain a highly diverse environment. When using balance theory to study the activities of scienti c communities, communities try to keep the interactions balanced in terms of disciplines among communities that are communicated with. To illustrate the triad relationship that balance theory focuses on, let us consider the following example. From the perspective of community A, the probability of building link between A and B is determined by the peer communities associated with A. If there are more neighbor communities of A similar to A than neighbors dissimilar to A and the similarity between A and B is lower than the average, then it is more likely for A to communicate with B. On the contrary, if there are less neighbor communities of A similar to A than dissimilar neighbors and the similarity between A and B is lower than the average, then it is less likely for A to communicate with B. These relations are described in Table 7.1. Table 7.1: Illustration of Building Links based on Balance Theory Similarity between A and B Number of Neighbors Similar to A Probability of Build- ing Link High More Low High Less High Low More High Low Less Low The similarity between communities is determined by their disciplines, which is de ned as Equation 7.3 and 7.4. 114 When using balance theory to select peers to communicate, if more communities are above the average dissimilarity, either decrease the in uence of a community with above average dissimilarity or randomly select another community with below average dissimilarity in order to reach balance. On the other hand, if more communities are lower than the average dissimilarity, either decrease the in uence of a community below average dissimilarity or randomly select another community with above average dissimilarity in order to reach balance. The process of using balance theory to build the communication network is shown in Figure 7.9. Figure 7.9: Process of Communication using Balance Theory The following discussion delineates the process of updating in uence of neighbors using balance theory. The balance is de ned as the equilibrium where the discipline of communities 115 converge to. In cases where there are di erences in opinion between communities, their needs for balance motivate them to increase their communication frequency with one another to reach an agreement. Given the assumption that communities? opinions are based on their discipline, the larger di erence of discipline results in the larger di erence of opinion. So, communities would like to communicate more with communities with larger di erence in order to reach the balance. The Colorscape model is based on boundary processes that drive interacting communities to move toward each other, which in turn reduces their di erences. Such interactions guided by balance theory and boundary processes cause the dissimilarities between communities to change dynamically. During each interaction, if the dissimilarity of a community?s neighbor j is above the average between the community i and all community i0s neighbors, the communication frequency between community i and j increases. Otherwise, their communication frequency decreases. Equation 7.7 describes how the in uences of neighbors are updated. Wji;t = sin( 2 (Dji;t Di;t)); Wji;t+1 = Wji;t + Wji;t; (7.7) where Dji;t is the dissimilarity between community i and its neighbor j at time t. Di;t is the average dissimilarity between community i and all its neighbors. Wji;t and Wji;t are the in uences of community j on community i and the increment of such in uences, respectively. In the extreme case, the maximum Dji;t is 1 and Di;t is close to 0, then the maximum of Dji;t Di;t is 1. Under this case, Wji;t = sin( 2 (1 0)) = 1. On the other hand, the minimum Dji;t is 0 and Di;t is close to 1. Then the minimum Wji;t = sin( 2 (0 1)) = 1. Figure 7.10 depicts change in Wji;t over Dji;t given di erent Di;t. 116 Figure 7.10: In uences Change with Dissimilarity 7.5.2 Validation In [54], Lane posits that the relationship between organizations can be categorized into positive and negative. It is pointed out in [107] that the condition for a network to be balanced is that the product of relation ties between organizations is positive. According to homophily theory, communities would like to communicate with those communities similar to them. So, for community i, the communication ties between community i and others with higher similarity than average are viewed as positive relationship. On the contrary, the communication ties between community i and others with lower similarity than average are viewed as negative relationship. We plot positive and negative relationship, which correspond to the number of communities with higher and lower similarity respectively, which is shown in Figure 7.11(a). Also the product of all relation ties is plotted, where each positive tie is represented by 1 and each negative tie is represented by -1, which is shown in Figure 7.11(b). Based on these two gures, we can observe that the product of all ties oscillates between -1 and 1. In complex adaptive systems, there are three fundamental kinds of attractors: xed 117 (a) Positive and Negative Relations Change over Time (b) The Product of Relations Figure 7.11: Relations under Balance Theory point attractor, limit cycle attractor, and chaos attractor. Figure 7.11(b) shows there is a limit cycle attractor existed, in which the exact state of the system cannot be predicted, although we know it will be either -1 or 1. It means that the system reaches a dynamic balance compared with the xed balance in [107], where the number of positive and negative ties keeps the same dynamically, shown in Figure 7.11(a). Such a dynamic balance makes communities satisfy with their status, and the communication network formed by them is balanced. 7.6 Exchange Theory 7.6.1 Model Design According to exchange theory, the necessary condition for the realization of a network tie is the discrepancy in resource. In the Colorscape model, the discrepancy in resource be- tween communities is re ected by the brightness component of the HSB color model. When a community cannot achieve enough resources by solving problems in its own domain, it tries to solve inter-disciplinary problems by collaborating with peers. Once community i nds an inter-disciplinary problem with potential funds, community i will ask other communities for collaboration. Another community j who can solve the problem may be willing to collabo- rate. Then a link between community i and j is created. In such cases, what are exchanged 118 between communities are resources based on the knowledge and skills of communities. Dif- ferent from the balance theory, the exchange theory interprets communication at the dyad level, which means the communication link is determined by the two parties involved in the collaboration. There is a website named InnoCentive [45] that uses challenge-driven innovation mech- anism to bridge companies that have problems to be solved and users who would like to capitalize their knowledge. When a company has a problem to be solved, the company may try to post the problem on this website. Those users who are interested in this problem may submit their solutions, one of which will be selected by the sponsor company. The author of the selected solution is rewarded. P2P lending process is de ned as follows: 1. Divide the discipline into 36 domains (i.e., 10 degree per domain), and put some problems in each domain. 2. If the domain a community inhabits has problems, then the community receives the corresponding funding up to a threshold, i.e., the maximum value a community can achieve per time step. 3. If the domain a community inhabits does not have problems, the community looks through neighbor domains until the community nds a domain with problems. Then the community calls for a proposal to collaborate. 4. All other communities will receive the invitation. Only those communities within the domain will respond with a bid that shows the ratio of funding the community gets to those resources the sponsor gets. 5. The community who initializes the call for proposals chooses the bid with the highest ratio. 119 There are three kinds of collaborations that may occur during the P2P lending process: two parties within discipline, two parties cross discipline, and triple parties. Type I collab- oration takes place when the sponsor community receives reply from communities within the domain where the problem exists. Type II collaboration is interdisciplinary collabora- tion, which happens when the domain of the problem is just between the sponsor and the responder communities. Type III collaboration occurs when there are two responders whose domains are just adjacent neighbor of the problem. These collaborations are illustrated in Figure 7.12, where the purple and blue circle represent sponsor and responder respectively. Figure 7.12: P2P Collaborations If there are several communities that can collaborate, a community may choose one that can help reduce the dependency on other communities. In order to reduce the dependency, a community i seeks to forge links with communities not connected with community i. In [61], Monge points out the network extension, which means that organizations can seek to increase the number of exchange alternatives by creating new network links. Each communication theory has two functions, one of which is to build a communication network. The other is to update the weights of neighbors. When using the exchange theory to update links connecting to neighbors, the weights increase when an exchange occurs during a time interval. Otherwise, the weights of links connecting to neighbors decrease if no exchanges happen between them during a time interval. 120 7.6.2 Validation 7.6.2.1 Resource Accessibility Brass [13] claims that the organization?s access to resources is re ected by closeness centrality that refers to the extent to which people, group, and organizations can reach all others in a network through a minimum of intermediaries. It means that higher closeness will have more resources. Further, Brass [14] found that the measure of centrality correlated with reputational measures of power, which in turn in uences the organization?s ability to achieve resources. The closeness centrality is calculated as shown in Equation 7.8 [65]. Ci = N 1PN 1 j=1 di;j ; (7.8) where di;j is the minimum distance between community i and j. N is the total number of communities in this network. We design an experiment where the simulation of the Colorscape model was replicated 30 times. At the end of each single run, the closeness centrality and resources of each commu- nity are recorded. For each level of centrality, the average level of resources of corresponding communities are computed. Figure 7.13 depicts the average resources of communities chang- ing along with the closeness centrality of these communities. Based on Figure 7.13, we can observe that the average resource level increases along with closeness centrality. The underlying reason may be that higher closeness centrality due to direct connections with peers, results in larger number of similar communities due to boundary processes. Thus, the community with higher closeness centrality is likely to survive, because it has more opportunities for collaboration with similar peers. 121 Figure 7.13: Resource Availability along with Closeness Centrality 7.6.2.2 Law of N-Squared Krackhardt [51] identi ed the constraint of \Law of N-Squared", which simply notes that the number of potential links in a network organization increases geometrically with the number of people. We increase the total number of communities from 10 to 190, and then count the number of communities that may build collaborations with respect to each community in the P2P lending. For example, considering that the total number of communities is 10, if the number of potential target communities for each community i is xi, then the total number of potential target communities is P10i=1xi. For each case, simulation model is run 30 times to get the average value. Figure 7.14 shows that the number of target communities geometrically increases with population. 122 Figure 7.14: Number of Target Communities vs. Population 7.6.2.3 Iron law of Oligarchy The other constraint identi ed by Krackhardt [51] is the \Iron law of Oligarchy", which is the tendency for groups and social systems, even fervently democratic ones, to end up under the control of a few people. Figure 7.15 depicts how the network guided by the exchange theory evolves over time. The initial number of communities is 20. At the beginning, communities start communicating with each other. More and more communications happen over time so that communities tightly connect with each other. Furthermore, the boundary process pulls communities to move toward each other in terms of their domain. Because each cell in the resource landscape can only a ord one community, communities within the same cell have to search for collaboration. Although such collaboration may happen, communities still fade out due to small portion of resources the collaboration community likes to share. Thus, the total 123 number of communities decreases until the last winner communities stay within their resource cells. Figure 7.15: Emergent Networks over Time 7.7 Experiments on Communication Theories In this section, experiments are conducted to investigate the impact of scienti c commu- nity traits (i.e., receptivity, exibility) and environmental constraints (i.e., external resources, communication strategies) on the innovation potential and performance of GPS. 124 Table 7.2 lists all the parameters that could be changed in the following experiments. Table 7.2: Experimental Parameters Name Default Value Carrying Capacity 60 Startup Funding 2 External Resource 2 Tolerance 0.6 Reorganization Tendency 0.5 Receptivity 0.5 Allocation Strategy P2PAllocation Communication Style Balance Communication Frequency 0.4 Threshold to Grow 0.5 7.7.1 Variety vs. External Resource Figure 7.16 depicts how diversity changes with respect to the size of external resources injected into the environment. The abscissa indicates the amount of resources allocated to each community per time tick. For all the communication theories, the variety increases along with external resources. Also, the scale of variety is really similar, indicating that these communication theories do not have signi cant di erences on the e ects on variety under the P2P allocation strategy. After setting the communication frequency to 0.1, we depict the change in variety over external resources in Figure 7.17. From this gure, we observe that variety is less sensitive to the external resources, since variety almost remains unchanged. Based on observations denoted by Figures 7.16 and 7.17, policy-makers need to be cognizant that increasing funding does not always help increase variety, especially for those communities with relatively low communication frequency. 125 Figure 7.16: Variety vs. External Resources at Moderate Communication Frequency Figure 7.17: Variety vs. External Resources at Low Communication Frequency 7.7.2 Sustainability vs. Resource Availability In ecology, sustainability refers to the ability of biological systems to remain diverse and productive over time. In the domain of creativity, sustainability can be interpreted as the e ectiveness of communities in utilizing resources. So, we relate it to success rate, which measures the extent to which communities are e ective in making use of resources to 126 improve their maturity, while maintaining themselves. Success rate is de ned as the ratio of the number of active communities remaining at the end of simulation to carrying capacity. Figure 7.18(a) shows that sustainability increases with external resources, while its rate of increase gradually decreases with increasing external resources. This suggests the presence of an asymptote, toward which sustainability moves with increasing external resource levels. If communication frequency is decreased to 0.1, the change in sustainability over resources is as depicted in Figure 7.18(b), in which similar trends are observed but with relatively large scale; that is, sustainability increases with decreasing communication frequency. As scienti c communities can be viewed as arti cial ecosystems, the communication frequency is similar to the evolution frequency. Lower evolution frequency leads to fewer species to be eliminated. (a) High Communication Frequency (b) Low Communication Frequency Figure 7.18: Sustainability vs. External Resources 127 7.7.3 Sustainability vs. Receptivity Figure 7.19(a) shows the change in sustainability with respect to varying levels of recep- tivity at low communication frequency. Receptivity of a community is de ned as the ratio of neighbor in uence to inertia. We observe that sustainability almost does not vary with receptivity at low communication frequency. When the communication frequency increases to 0.7, sustainability vs. receptivity is depicted in Figure 7.19(b), which shows sustainabil- ity decreasing with increasing receptivity. Based on this comparison, decision-makers may develop policies to encourage communities to be more independent, if there are too many interacting activities in the domain. (a) Low Communication Frequency (b) High Communication Frequency Figure 7.19: Sustainability vs. Receptivity 128 7.7.3.1 Variety vs. Receptivity Figure 7.20 shows the change in diversity with respect to varying levels of receptivity. Receptivity of a community is de ned as the ratio of neighbor in uence to inertia. This gure shows that variety increases with increasing receptivity for all the theories. When receptivity is low, these communication theories lead to similar variety, because communication theories will not have e ects on interactions between communities if communities have little in uence on each other. In addition, in comparison to other theories, the exchange theory is less sensitive to receptivity in terms of variety. The potential reason is that the communication guided by exchange theory is based on distribution of resources on the innovation landscape, which is not directly related to variety. In addition, variety under balance, homophily, and structural hole theory increases monotonically with receptivity, the reason behind which is that these three communication theories build connections based on traits of communities that are directly related to variety. Figure 7.20: Variety vs. Receptivity under Low Communication Frequency Figure 7.20 shows the relation between variety and receptivity under low communication frequency. When communication frequency is increased to 0.7, the relation between variety and receptivity is depicted in Figure 7.21, based on which, we can observe the opposite 129 trend depicted in Figure 7.20. When the communication frequency is low, variety increases along with receptivity. On the other hand, variety decreases with increasing receptivity under high communication frequency. This comparison suggests that decision makers may consider promoting policies that encourage communities to be more open, if the scienti c domain has relatively fewer communication activities. On the contrary, decision makers may encourage communities to increase inertia if more communication activities occur in this domain. Figure 7.21: Variety vs. Receptivity under High Communication Frequency 7.7.3.2 Innovation Potential It is shown in [25] that communities with low density and high centrality are expected to exhibit higher innovation potential. Figure 7.22 depicts change in density and centrality over receptivity under various communication theories. Based on these gures, we can observe that all communication theories except prefer- ential attachment based on links lead to the emergence of communication networks with small density and large centrality along with increasing receptivity. When communities are guided by preferential attachment based on links, communities are always willing to connect 130 (a) (b) (c) (d) (e) (f) Figure 7.22: Innovation Potential to those with larger number of links, which directly determines the communication network structure. So, receptivity under preferential attachment based on links theory does not play 131 a signi cant role as it does under other theories in terms of density and centrality. In addi- tion, under the homophily theory, density signi cantly decreases with increasing receptivity, while centrality almost remains the same, which demonstrates that receptivity is a positive factor that improves innovation potential. 7.7.3.3 Knowledge Di usion E ciency Cliquish networks with low average path lengths exhibit the small-world phenomena and are known to be e ective in knowledge creation and di usion [23]. It is proved that the small world structure is an e cient architecture for new knowledge to di use [22]. Small world network structure is identi ed by a high clustering coe cient and a shorter average path. Figure 7.23 shows the change in clustering coe cient and average path length over receptivity under di erent communication theories, with the communication frequency set to 0.4. From these gures, we can observe that the communication network guided by balance, exchange, homophily, preferential attachment based on links, preferential attachment based on resources results in decreased clustering coe cient and increased average path length, along with increasing receptivity. It demonstrates that receptivity is a negative factor for communities under these theories to form a small world. In addition, balance theory leads to the highest knowledge di usion e ciency, since it results in the highest clustering coe cient and one of the shortest average path length. Based on this experimental result, decision- makers may encourage communities to keep self-centering to form a small world network when communication frequency is moderate. 7.7.3.4 Network Patterns When the communication frequency is high, emergent networks generated by balance and exchange theories always lead to a high density. So, we decrease the communication 132 (a) Clustering Coe cient vs. Receptivity (b) Average Path vs. Receptivity Figure 7.23: Knowledge Di usion E ciency frequency to 0.1; that means each community has 0.1 probability of undertaking communi- cations in each time interval. The following gures (Figure 7.24, 7.25, and 7.27) show net- work patterns formed by communities under communication theories including homophily, structural hole, preferential attachment, balance, and exchange, respectively. Based on the comparison, the networks under homophily and exchange theory have clusters of similar communities in terms of hue emergence, which are depicted in Figure 7.24. In addition, the network (shown in Figure 7.25) guided by preferential attachment based on links ex- hibits the property of scale-free network, since its link distribution (shown in Figure 7.26) follows a power law with R2 = 0:81. Moreover, networks under balance, structural hole, and preferential attachment based on resources theories demonstrate a core/periphery net- work, where a core with highly connected communities emerges, which are shown in Figure 133 7.27. The core/periphery ratio of these three networks are 4, 11, and 6 respectively, indi- cating that more core communities are surrounded by fewer periphery communities. Similar phenomenon is exhibited in the OBO network as shown in Table 5.8. (a) Homophily Theory (b) Exchange Theory Figure 7.24: Networks Generated under Homophiliy and Exchange Theory 134 Figure 7.25: The Network Generated under Preference Attachment based on Links Theory (a) Histogram of Communities? Links (b) Linear Regression of Logarithmic Value of Links Figure 7.26: Communities? Links 135 (a) Balance Theory (b) Structural Hole Theory (c) Preference Attachment based on Resources Theory Figure 7.27: Networks under Balance, Structural Hole, and Preference Attachment based on Resources Theory 136 Chapter 8 Conclusions In this chapter we outline our ndings and discuss them in the context of collective creativity in global participatory science. Also, future research avenues for extending the current model to resolve its limitations are delineated. 8.1 Findings and Discussion In this study, we conceptualized and simulated the growth and development of scien- ti c communities in terms of a complex adaptive communication system that follows the principles of creative arti cial ecosystems. Using social communication theories as behavioral rules of agents can facilitate develop- ment of a new layer over the existing Agent Communication Language (ACL) [31] framework that is based on the speech-act theory [7]. The new layer enables speci cation of communi- cation mechanisms over the basic primitives provided by ACL. The communication proto- col manages connections between agents from di erent perspectives including communities? traits, self-interest, and discrepancy in resources. In addition, the RGV (Robust Generative Validation) strategy presented in this dissertation can help researchers address Veri cation and Validation (V&V) challenges of ABM, e.g., counterintuitive emergent behavior, as well as structural and parametric uncertainty. The usefulness of the RGV framework is examined by the validation process of the ColorScape model against the empirical OBO data and the science overlay map. This research provides a computer-aided tool for science policy development, which is a theme that aims to provide a scienti cally rigorous quantitative basis that can be used 137 by policy makers to assess the impact of their decisions on the growth and development of scienti c elds. The main objective of this research is to explore the impact of scienti c community traits (i.e., receptivity, exibility, reorganization tendency) and environmental constraints (i.e., interaction topologies, resource allocation strategies, socio-technical communication preferences) on the innovation potential (e.g., diversity, sustainability, and resilience) of GPS. Based on the experiments conducted with the ColorScape model, we draw the conclusions discussed in the following sections. 8.1.1 ColorScape: A General Purpose Model The ColorScape model introduced in this dissertation is a general-purpose creative arti- cial innovation ecosystem model that can mimic the behavior of both traditional and open innovation communities. The model is conceptually grounded and validated in terms of its capability to generate similar metrics against the science overlay map [75] and the empirical OBO network [86]. 8.1.2 Community?s Traits vs. Diversity In low density networks, increasing levels of receptivity improves diversity up to a level. On the contrary, diversity decreases with increasing receptivity in highly coupled networks. Under environments with high receptivity, the reason that diversity favors low connectivity is that presence of dense communication channels causes convergence, which in turn decreases diversity. Experimental results suggest encouraging communities to be more receptive in relatively low density environments to attain higher levels of diversity. Reorganization adversely a ects diversity. On the other hand, specialization has positive e ects on diversity. Reorganization and specialization strategies help com- munities adapt to their environment, when the community cannot meet the expectations 138 of its members. This observation is consistent with the functionality of specialization and reorganization. Specialization facilitates creation of a new community with a di erent tar- get color from the current community, while reorganization involves pulling the target color toward the current color, causing convergence. 8.1.3 Environmental Constraints vs. Diversity, Sustainability, and Resilience The size of the carrying capacity of the knowledge ecosystem has positive e ects on diversity. Yet, there is a point of diminishing returns. Increasing the number of communities improves the probability of forming more clusters comprised of sim- ilar communities. But there is maximum diversity given a xed scienti c spectrum and the maximum di erence within clusters. So, diversity cannot increase inde nitely with the car- rying capacity. By the same token, increasing external resources leads to increased diversity up to a point, beyond which more resources can only increase the number of communities within a cluster rather than the number of clusters. For policy makers, it is noteworthy that neither external resources nor initial community number can keep diversity increasing, i.e., there is a tradeo between the available funding and the expected level of diversity. Disparity increases with resources up to a point. Beyond that point, dispar- ity decreases with increasing resources. At the same time, lower connectedness cause higher disparity. Disparity increases with the increasing level of resource availabil- ity, because more resources lead to higher success rate, which in turn results in disparity. On the other hand, beyond that point, disparity decreases because of the decreased need for interaction for sustainment, which in turn decreases inequality. In addition, disparity increases with decreasing level of connectivity; this is due to decreased convergence under low connectivity. To meet the desired level of disparity, policy-makers need to consider the connectedness of the social communication network when making decisions on allocating external funding. 139 The 2D topology is more resilient than the 1D topology, and scale-free net- works have higher resilience than random and random group networks. The 1D topology corresponds to relations between upstream and downstream organizations. Gen- erally speaking, upstream organizations lead the frontier research and determine the direc- tion of future research in their domains. Downstream organizations mainly transfer the technology developed by upstream organizations into products. The 2D topology adds the collaborations between organizations at the same level in the organizational ecological chain. Communities with highly connected clusters under low level of resource avail- ability can experience high levels of sustainability. On the contrary, under mod- erate level of resources, loosely connected clusters are more likely to survive. A plausible explanation for this observation is that higher resource availability leads to higher variety. Under high variety, larger connectivity causes each community to be pulled toward multiple di erent cognitive niches, resulting in lack of focus which in turn costs communities more resources, and hence decreasing the survival rate. On the other hand, lower resource availability leads to lower variety. Under low variety, however, strong connectivity results in more communities sharing similar states, bene ting from each other through a symbi- otic relation, which in turn increases the overall survival rate. Based on these observations, policy-makers may encourage communities to build highly connected clusters if resource availability is low. 8.1.4 Network Metrics vs. Variety One goal of this research is to identify a metric hierarchy, two layers of which are network metrics and attributes. Network metrics include density and centrality, while attributes include diversity. Little research is undertaken to make clear the relation between these two layers. Based on the experiments conducted with the Colorscape model, we observe that variety increases with density and centrality up to a point, beyond which variety is inhibited. Similar results are also found in [40][69]. According to these observations, 140 researchers may estimate expected levels of variety based on the density and centrality of social networks; that is, neither low nor high density/centrality results in high variety. High variety occurs at moderate density/centrality. 8.1.5 Allocation Strategies vs. Variety Key area investment with technology transferring results in the highest level of variety. This is similar to the case where domains with lower priority still have potential to advance, yet the environment promotes development of domains related to priorities. Additionally, the communities with most resources granted may not be as successful as expected, if the domain with the top priority is located between several signi cantly di erent domains. A potential reason is that the domain with top priority is pulled toward several di erent directions, which could have incurred signi cant resource cost during the learning process, resulting in decreased number of communities with most resources granted. For policy-makers, they should also consider the interaction networks around the domain with the top priority, when making decisions about funding allocation. 8.1.6 Communication Strategies vs. Diversity, Sustainability, and Innovation Potential Examined communication strategies are not signi cantly di erent from each other, especially in regard to the relation between variety and external resources under the P2P allocation strategy. Increasing funding does not always help in- crease variety, especially for those communities with relatively low communica- tion frequency. Communication theories change the strategy of communities in selecting targets, based on which local niches are emerged. Di erent formation of local niches results in di erent local diversity. But the local diversity does not in uence the global diversity sig- ni cantly. In addition, low communication frequency results in fewer interaction activities, causing fewer resources to be consumed. So, external resources have little e ect on variety at 141 low communication frequency. As scienti c communities can be viewed as arti cial ecosys- tems, the communication frequency is similar to the evolution frequency. Lower evolution frequency leads to fewer species to be eliminated. This suggests that sustainability favors low communication frequency. Based on this observation, policy-makers may discourage inter-organizational activities when resources are limited. Under low communication frequency, openness and receptivity lead to higher variety. On the contrary, variety decreases with increasing receptivity under high communication frequency. The potential reason is that higher receptivity results in more communities sharing similar states under low communication frequency, bene ting from each other through a symbiotic relation, which in turn increases the overall survival rate. On the other hand, under high communication frequency, higher receptivity results in the convergence of communities, which in turn lead to more communities inhabiting within the same domain. Under the P2P allocation strategy, one domain can only sustain xed number of communities so that the survival rate decreases with increasing receptivity. This is also why sustainability decreases with increasing receptivity at high communication frequency. Receptivity is a positive factor that improves innovation potential for com- munities under high communication frequency. At high communication frequency, higher receptivity results in lower survival rate, which in turn leads to lower density and lower centrality. However, density decreases at a higher rate than centrality does, resulting in signi cant di erence between levels of density and centrality. Networks with low density and high centrality are attributed with higher innovation potential [25]. In comparison to the previous nding that sustainability and variety decrease with increasing receptivity at high communication frequency, there is a tradeo between sustainability, variety, and inno- vation potential. Decision-makers have to take this observation into consideration to develop policies that balance these three indicators. 142 Networks governed by the homophily and the exchange theories yield clus- ters of similar communities. Under the homophily theory, communities communicate with similar peer communities. Under the exchange theory, the transaction occurs when communities solve problems by collaboration, during which collaborating communities be- come similar. So, both of these theories help local clusters with similar communities to emerge. Based on this nding, policy-makers need to be cognizant that local niches are likely to exist in networks of communities guided by the homophily or the exchange theory. 8.2 Extensions The main emphasis of the Colorscape model presented in this dissertation is the inter- connection among communities. Hence, the model can be used to simulate the behavior of networks formed by communities, among which dynamic relationships exist. These types of communities include the following [22]: Shared Instrument: The main objective of such communities is to increase access to a scienti c instrument. Shared Instrument collaboratories often provide remote access to expensive scienti c instruments such as telescopes. For such communities, the Colorscape model can help discern e ective strategies for improving the collective use of expensive instruments. Virtual Community of Practice is a network of individuals who share a research area and communicate online. Virtual Communities may share news of professional interest, advice, techniques, or pointers to other resources online. Virtual Learning Communities aim to increase the knowledge of participants, but not necessarily aimed toward conducting original research. Distributed Research Centers are similar to a university research center, but they are operated at a distance. It is an attempt to aggregate scienti c talent, e orts, and resources beyond the level of individual researchers. 143 For networks comprised of the types of communities listed above, the Colorscape model needs to be slightly modi ed according to the speci c characteristics of relations between each type of communities, so that it can simulate the dynamics of emergent networks and facilitate the analysis of the output data. 8.3 Limitation and Future Research One limitation of the Colorscape model is its inability to generate network patterns sim- ilar to the science overlay map, although the structural indicators such as density, centrality, clustering coe cient are su ciently similar. The potential reason is that communities se- lect target communities to communicate globally. It may be valuable to limit the scope of potential targets that communities can select. This strategy aims to encourage more local niches to emerge, which is observed in contemporary research development with relatively high-coupled clusters and fewer international connections. Besides communication theories already implemented in the Colorscape model, there are other theories that can also be embedded in the model. Public goods theory [81] explains the economics of collective ownership such as public bridges, parks, and libraries, which are distinguished from the private ownership. Two characteristics of public goods are noteworthy: impossibility of exclusion and jointness of supply. There is a determining factor in generating public goods named critical mass [67], which is de ned as the minimum interest that drives the majority of people to realize the public good. Cognitive social structure [50] is to characterize individual community?s perceptions of the social network. The theory can be used to build the community?s understanding of the network, which is partial and may be di erent from the real network. The communities in the partial network cognized by a community are its candidate objects for future communication. Two steps are needed, one of which is to build the partial network. The other is to select communities to communicate within the partial network. 144 Cognitive consistency theory [42] argues that communities are satis ed with their positions in the communication network if their associated peer communities are connected with one another. Assuming the set of neighbors of a community A is S, then the in uence of a community B in S on A is proportional to the number of B0s links to other communities in S. This research provides a computer-aided tool (i.e., the ColorScape Model) for science policy development, so that decision-makers can assess the impact of their decisions on the growth and development of scienti c elds in advance. By undertaking experiments pre- sented in Chapters six and seven, decision-makers can alter communities? traits, resource allocation strategies, and socio-technical communication preferences to examine their im- pacts on innovation potential and performance. In addition, the resource allocation module and the communication preferences module can be extended and replaced by other modules that decision-makers are interested in. Hence, the ColorScape model can be customized to facilitate conducting abstract thought experiments for exploring e ective strategies, while allowing informed decision-making for science and innovation policy. 145 Bibliography [1] Missile Defense Agency. Department of defense documentation of veri cation, vali- dation & accreditation (vv&a) for models and simulations. Technical report, Missile Defense Agency, 2008. [2] P. Ahrweiler, A. Pyka, and N. Gilbert. A new model for university-industry links in knowledge-based economies. Journal of Product Innovation Management, 2010. [3] H. Aldrich. Organizations evolving. Thousand Oaks. [4] Teresa M Amabile, Sigal G Barsade, Jennifer S Mueller, and Barry M Staw. A ect and creativity at work. Administrative Science Quarterly, 2006. [5] Martyn Amos. Theoretical and Experimental DNA Computation. Springer, 2005. [6] W. Brian Arthur, Steven N. Durlauf, and David A. Lane. The economy as an evolving complex system. Addison-Wesley, 1997. [7] John Langshaw Austin. How to Do Things With Words. Harvard University Press, 2005. [8] Robert Axelrod. The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration. Princeton University Press. [9] Jerry Banks. Handbook of Simulation: Principles, Methodology, Advances, Applica- tions, and Practice. Wiley-Interscience, 1998. [10] Arjun Bhutkar. Synthetic biology: Navigating the challenges ahead. The Journal of Biolaw & Business, 2005. [11] Aharon Blanka and Sorin Solomon. Power laws in cities population, nancial markets and internet sites (scaling in systems with a variable number of components). Physica A: Statistical Mechanics and its Applications, 287:279{288, 2000. [12] Eric Bonabeau. Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences of the United States of America, 2002. [13] Daniel J. Brass. Being in the right place: a structural analysis of individual in uence in an organization. Administrative Science Quarterly, 29:519{539, 1984. 146 [14] Daniel J. Brass. Technology and the structuring of jobs: Employee satisfaction, per- formance, and in uence. Organizational Behavior and Human Decision Processes, 35:216{240, 1985. [15] Daniel J. Brass. A social network perspective on human resources management. Re- search in personnel and human resources management, 1995. [16] Iain Buchan. Calculating the gini coe cient of inequality. Technical report, Northwest Institute for BioHealth Informatics, 2002. [17] R. S. Burt. Structural holes: The social structure of competition. Harvard University Press, 1992. [18] Ronald S. Burt. Social contagion and innovation: Cohesion versus structural equiva- lence. The American Journal of Sociology, 1987. [19] Ronald S. Burt. The gender of social capital. Rationality and Society, 1998. [20] James S. Coleman. Individual Interests and Collective Action: Studies in Rationality and Social Change. Cambridge University Press, 1986. [21] James S. Coleman. Foundations of Social Theory. Belknap Press of Harvard University Press, 1998. [22] R. Cowan. The dynamics of collective invention. Journal of Economic Behavior & Organization, 52:513532, 2003. [23] R. Cowan and N. Jonard. Network structure and the di usion of knowledge. Journal of Economic Dynamics and Control, 2004. [24] Susan Cozzens. A deeper look at the visualization of scienti c discovery in the federal context. Technical report, Georgia Tech, 2008. [25] C. Dhanaraj and A. Parkhe. Orchestrating innovation networks. Academy of Manage- ment Review, 2006. [26] Virginia Dignum, Frank Dignum, and Liz Sonenberg. Design and analysis of organi- zation adaptation in agent systems. In Agent-Directed Simulation and Systems Engi- neering. Wiley, 2009. [27] B. Edmonds and E. Chattoe. When simple measures fail: Characterising social net- works using simulation. Technical report, Social Network Analysis: Advances and Empirical Applications Forum, 2005. [28] Vernon J. Ehlers. The future of u.s. science policy. Science, 279:302, 1998. [29] Joshua M. Epstein. Generative Social Science: Studies in Agent-Based Computational Modeling. Princeton University Press, 2007. 147 [30] Henry Etzkowitz. The triple helix of university-industry-government rela- tions implications for policy and evaluation. Technical report, Retrieved from http://ssi.sagepub.com/cgi/doi/10.1177/05390184030423002, 2002. [31] FIPA. Fipa acl message structure speci cation. Technical report, Foundation for Intelligent Physical Agents, 2002. [32] J. Fulk and G. DeSanctis. Articulation of communication technology and organiza- tional form. In Shaping organizational form: Communication, connection, and com- munity. Thousand Oaks, 1999. [33] Paul J. Gemperline. Statistical evaluation of visualization methods. Technical report, Department of Chemistry, East Carolina University, 2007. [34] N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2, 1997. [35] N. Gilbert, P. Ahrweiler, and A. Pyka. Learning in innovation networks: some simu- lation experiments. Innovation in complex social systems, 2010. [36] Peter A. Gloor, Maria Paasivaara, Detlef Schoder, and Paul Willems. Finding col- laborative innovation networks through correlating performance with social network structure. Technical report, MIT, 2007. [37] Lance H. Gunderson and C. S. Holling. Panarchy: Understanding Transformations in Human and Natural Systems. Island Press, 2001. [38] F. Heider. Attitudes and cognitive organization. Journal of Psychology, 21:107{112, 1946. [39] L.J Heyer, S. Kruglyak, and S. Yooseph. Exploring expression data: identi cation and analysis of coexpressed genes. Genome Research, 1999. [40] Matthew H Hohn. The relationship between species diversity and population den- sity in diatom populations from silver springs, orida. Transactions of the American Microscopical Society, 1961. [41] J. H. Holland. Hidden Order: How Adaptation Builds Complexity. Perseus Books, 1995. [42] Paul W. Holland and Samuel Leinhardt. The statistical analysis of local structure in social networks. NBER Working Paper Series, w0044, 1974. [43] George Caspar Homans. Social behavior: Its elementary forms. Harcourt, Brace & World, 1961. [44] C. Hovland and E. Hunt. The computer simulation of concept attainment. Behavioral Science, 5:265{267, 1960. 148 [45] InnoCentive. Challenge driven innovation. http://www.innocentive.com/seekers/ challenge-driven-innovation, 2011. [46] Alex Kacelnik. Timing and foraging: Gibbons scalar expectancy theory and optimal patch exploitation. Learning and Motivation, 33, 2002. [47] David Kaiser, Vincent Lepinay, and David Jones. Predictive modeling of the emer- gence and development of scienti c elds. Technical report, Massachusetts Institute of Technology, 2010. [48] David Klahr and Herbert A. Simon. The dynamics of collective invention. Psychological Bulletin, 125:524{543, 1999. [49] Genevieve J. Knezo. Federal research and development: Budgeting and priority-setting issues, 109th congress. Technical report, Resources, Science, and Industry Division, 2006. [50] D. Krackhardt. Cognitive social structures. Social Networks, 9:109{134, 1987. [51] David Krackhardt. Constraints on the interactive organization as an ideal type. The Post-Bureaucratic Organization, pages 211{222, 1994. [52] V. Krebs and J. Holley. Building sustainable communities through network building. Technical report, 2002. [53] Thomas S. Kuhn. The Structure of Scienti c Revolutions. University Of Chicago Press, 1996. [54] D.R. Lane. Spring 2001 theory workbook. http://www.uky.edu/~drlane/capstone/ persuasion/bal.htm, 2001. [55] Michael W. Macy and Robert Willer. From factors to actors: Computational sociology and agent-based modeling. Annual Review of Sociology, 2002. [56] Jon McCormack. Arti cial ecosystems for creative discovery. Technical report, Cen- tre for Electronic Media Art Faculty of Information Technology, Monash University Clayton 3800, Australia, 2007. [57] R. McDermott. Learning across teams. Knowledge Management Review, 1999. [58] John H. Miller and Scott E. Page. Complex Adaptive Systems, An introduction to computational models of social life. Princeton University Press, 2007. [59] Melanie Mitchell and Charles E. Taylor. Evolutionary computation: An overview. Annual Reviews, 1999. [60] Susan A. Mohrman and Caroline S. Wagner. The dynamics of knowledge creation: Phase one assessment of the role and contribution of the department of energy?s nanoscale science research centers. Technical report, Center for E ective Organiza- tions Marshall School of Business University of Southern California, 2008. 149 [61] Peter R. Monge and Noshir S. Contactor. Theories of Communication Networks. Ox- ford, 2003. [62] nanoHUB.org. Network for computational nanotechnology. http://nanohub.org, 2010. [63] NEESgrid. Network for earthquake engineering simulation. http://it.nees.org/, 2010. [64] M. E. J. Newman. The structure of scienti c collaboration networks. PNAS, 98:404{ 409, 2001. [65] M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks, 27:39{54, 2005. [66] M. E. J. Newman. Power laws, pareto distributions and zipfs law. Contemporary Physics, 46:323 351, 2005. [67] P.E. Oliver. Formal models of collective action. Annual Review of Sociology, 19:271{ 300, 1993. [68] Tim Pawlenty, Edward G. Rendell, and Raymond C. Scheppach. Higher education, mandates and unintended consequences: An analysis of the moe mandate in hr 4137. National Governors Association, 2008. [69] Jill E. Perry-Smith. The social side of creativity: A static and dynamic social network perspective. academy of management review. The Academy of Management Review, 2003. [70] Peter Pirolli. An elementary social information foraging model. Proceedings of the 27th international conference on Human factors in computing systems, 2009. [71] Alan L. Porter and Ismael Rafols. Is science becoming more interdisciplinary? mea- suring and mapping six research elds over time. Scientometrics, 2009. [72] Daniel I. Prajogo and Pervaiz K. Ahmed. Relationships between innovation stimulus, innovation capacity, and innovation performance. R&D Management, 2006. [73] Andreas Pyka, Nigel Gilbert, and Petra Ahrweiler. Agent-based modelling of innova- tion networks - the fairytale of spillover. Complexity, 2009. [74] I. Rafols and M. Meyer. Diversity and network coherence as indicators of interdisci- plinarity: case studies in bionanoscience. Scientometrics, 2009. [75] Ismael Rafols, Alan L. Porter, and Loet Leydesdor . Science overlay maps: a new tool for research policy and library management. Technical report, Science and Technology Policy Research, University of Sussex, 2010. [76] John H. Reed, Gretchen Jordan, and Edward Vine. Impact evaluation framework for technology deployment programs. Technical report, US Department of Energy, 2007. 150 [77] Repast. Repast. http://repast.sourceforge.net/, 2010. [78] V. Riss and Hans Friedrich Witschel. What is organizational knowledge maturing and how can it be assessed? Proceedings of I-KNOW ?09 and I-SEMANTICS ?09, 2009. [79] Jr Rykiel. Testing ecological models : the meaning of validation. Ecological Modeling, 1996. [80] A. Saltelli, Andres Ratto, M., Campolongo T., Cariboni F., Gatelli J., D. Saisana, M., and S. Tarantola. Global Sensitivity Analysis. The Primer, John Wiley & Sons. [81] P. Samuelson. The pure theory of public expenditure. Review of Economics and Statistics, 36:387{389, 1954. [82] National Science and Technology Council. National nanotechnology initiative: The initiative and its implementation plan. Technical report, Retrieved from http://ssi.sagepub.com/cgi/doi/10.1177/05390184030423002, 2000. [83] Scott Shane. Encouraging university entrepreneurship? the e ect of the bayh-dole act on university patenting in the united states. Journal of Business Venturing, 2004. [84] J. Shrager and P. Langley. Computational Models of Scienti c Discovery and Theory Formation. Morgan Kaufman, 1990. [85] SKIN. Skin. http://cress.soc.surrey.ac.uk/SKIN, 2010. [86] Barry Smith, Michael Ashburner, Cornelius Rosse, Jonathan Bard, William Bug, Werner Ceusters, Louis J Goldberg, Karen Eilbeck, Amelia Ireland, Christopher J Mungall, The OBI Consortium, Neocles Leontis, Philippe Rocca-Serra, Alan Rutten- berg, Susanna-Assunta Sansone, Richard H Scheuermann, Nigam Shah, Patricia L Whetzel, and Suzanna Lewis. The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 2007. [87] R. Stankiewicz. Technology as an autonomous socio-cognitive system. Dynamics of Science-Based Innovation., 1992. [88] Craig R. Scott Steven R. Corman. Perceived networks, activity foci, and observable communication in social collectives. Communication Theory, 1994. [89] Andrew Stirling. Diversity and ignorance in electricity supply investment: Addressing the solution rather than the problem. Energy Policy, 22:195{216, 1994. [90] Richard Swedberg. Entrepreneurship: The Social Science View. Oxford University Press, USA, 2000. [91] Bill Valdez and Julia Lane. The science of science policy: a federal research roadmap. Technical report, National science and technology council, 2008. [92] Stephen Vincent. Input Data Analysis. Compuware Corporation, 1998. 151 [93] Caroline S. Wagner. The new invisible college, Science for development. Brookings Institution Press, 2008. [94] Brian Walker. Resilience, adaptability and transformability in socialecological systems. Ecology and Society, 2004. [95] Karl E Weick. The Social Psychology of Organizing. McGraw-Hill Humanities/Social Sciences/Languages, 1979. [96] Etienne Wenger. Communities of Practice: Learning, Meaning, and Identity. Cam- bridge University Press, 1998. [97] Etienne Wenger. Communities of practice and social learning systems. Organization, 7:225{246, 2000. [98] M.A West and J.L. Farr. Innovation and Creativity at Work. John Wiley& Sons, 1990. [99] Wikipedia. Complex system. http://en.wikipedia.org/wiki/Complex_system, 2010. [100] Wikipedia. Gini coe cient. http://en.wikipedia.org/wiki/Gini_coefficient, 2010. [101] Wikipedia. Innovation. http://en.wikipedia.org/wiki/Innovation, 2010. [102] Wikipedia. Integration testing. http://en.wikipedia.org/wiki/Integration_ testing, 2010. [103] Wikipedia. Minimum viable population. http://en.wikipedia.org/wiki/Minimum_ viable_population, 2010. [104] Wikipedia. Resource allocation. http://en.wikipedia.org/wiki/Resource_ allocation, 2010. [105] Wikipedia. Scienti c community. http://en.wikipedia.org/wiki/Scientific_ community, 2010. [106] Wikipedia. Preferential attachment. http://en.wikipedia.org/wiki/ Preferential_attachment, 2011. [107] Wikipedia. Social balance theory. http://en.wikipedia.org/wiki/Social_ balance_theory, 2011. [108] Levent Yilmaz. Dynamics of collective creativity and open innovation in scienti c commons, complex adaptive systems perspective. Technical report, Auburn University, 2008. [109] Levent Yilmaz. An agent simulation study on con ict, community climate and innova- tion in open source communities. International Journal of Open Source Software and Processes, 2009. 152 [110] Levent Yilmaz. On the synergy of con ict and collective creativity in open innova- tion socio-technical ecologies. International Conference on Computational Science and Engineering, 2009. [111] Levent Yilmaz, Guangyu Zou, and Osman Balci. A robust evolutionary strategy for generative validation of agent-based models using adaptive simulation ensembles. 2011 IEEE/ACM Winter Simulation Conference, 2011. [112] Guangyu Zou and Levent Yilmaz. A computational model of collective creativity and innovation in virtual open source science networks: What distinguishes innovative virtual communities? 2010 IEEE/ACM Winter Simulation Conference, 2010. [113] Guangyu Zou and Levent Yilmaz. Dynamics of knowledge creation in global participa- tory science communities: open innovation communities from a network perspective. Comput Math Organ Theory, 2010. 153