Towards Big Data Digitising Economic and Business History

An ambitious project was initiated in 2002 and concluded by 2007 by Finnish economic and business historians to analyse digitised news agency data in order to create a model to predict the behaviour of business enterprises. This project, entitled MetaSignal (later MetaAlert), was a joint venture between historians, journalism researchers, engineering scholars and economists working at the University of Jyväskylä and the Tampere University of Technology. The aim was nothing less ambitious than to create an artificial intelligence (AI) that could learn from the past to predict the future. The AI was intended to compile automatically, categorise and analyse available online information to find so-called weak signals from a massive flow of information. To ‘teach’ the AI, the project used a massive news agency database, including roughly 20 million business newsfeeds from the early 1970s to the early 2000s. For the first time in Finnish historical research, the project also used digitised full-text New York Times newspaper data from the 1850s onwards, together with databases containing information about listed companies and stock market prices over an extended period of time.

Needless to say, this bold initiative failed as the project did not have sufficient human resources or computational power circa 15 years ago to reach its goals.Nevertheless, as for the outcomes, the project did identify publications and networks that were valuable at the time and at least interesting even from today's perspective.One must bear in mind that the internet was still a newcomer at the turn of the millennium; thus, there were still many uncertainties as to which direction it would develop and which would be the most usable tools to find information on various topics.Moreover, the databases on emerging markets of the internet were also at a developing stage, and so was the price of information: the price of annual use of the databases used in the project roughly doubled every year.
By analysing the data available from open sources at the time and comparing it to the data purchased from the databases in the market, the project found, for example, that the very origins of the contemporary newsfeeds could be traced to few, well-established and old news agency firms or media companies. 1t was not until the emergence of the digital camera, smartphones and social media when the supremacy of these companies began to collapse, at least to a certain extent.
The project members did not necessarily even notice at the time how fast the environment around them was changing.The project participants travelled to Stanford to learn about the latest trends in Silicon Valley and report their findings to the steering group.Therefore, it was the historians and the other humanists who were the first to inform the others in the meetings with the funding agency Tekes (Finnish Institute of Technology and Innovation, nowadays known as Business Finland) about interesting emerging companies in the United States, like Facebook. 2 . . .The MetaSignal project was just an outcome of a long tradition of compiling and using massive databases, distant reading methods and, most importantly, sophisticated methods among economic and business historians to analyse numerical and textual data.The use of a massive database to predict future trends in the MetaSignal project was not, obviously, a ground-breaking idea.On the contrary, computerised methods have been used in social sciences in this respect at least from the early 1960s, when the first attempts were made at the RAND laboratories. 3conomic and business historians have been the forerunners in the digital history data gathering and analysis for decades.This chapter attempts to discuss the major developments internationally and, in some specific cases, in Finland in the fields of digital economic and business history, concentrating on some of our own projects, as well as research outcomes by economic and business historians at the University of Jyväskylä and within our networks.We are not claiming that our projects are unique or ahead of their time in the field of economic and business history-on the contrary.However, we feel that these projects are indeed illustrative cases (such as the aforementioned MetaSignal) about the possibilities and challenges facing historians in the digital era.
After a section introducing the use of digitised data in economic and business history, we will briefly discuss the methodological challenges in the use of these methods, followed by sections concentrating on event data analysis and challenges involved in using various databases (with some examples).Thereafter, we focus our narrative on the use of digital sources and methods in business history.In the concluding discussion, we will address the challenges and opportunities offered by digitised sources, followed by some exposition of the remaining challenges.

Big Data in Economic and Business History
Big data is at the heart of economic history research, and has already been so for decades. 4Big structures, large processes, huge comparisons, by Charles Tilly, a famous historical sociologist, was a book published in the mid-1980s that highlighted some of the early efforts in such scholarship.Tilly's classic studies urged researchers to study the macro-level societal structures systematically, to better understand large processes of change. 5Tilly was also one of the forerunners of 'social science history' , pushing sociological understanding to advance historical research.Economic historians were also part of this process and, to a certain extent, the first ones to explore and exploit the possibilities of social scientific methods and data in historical research.
Since the time of publication of Tilly's book, the datasets compiled and used by economic historians have become larger and more varied: numeric data is nowadays more often 'born digital'; and besides numbers, even economic historians are today more often using high-resolution digital images and digitised texts.The quantity of available data has increased dramatically, whereas the costs of storage have decreased-even though there is now a new challenge for academia arising from the costs of the best datasets and digitised library collections. 6As Guttman and colleagues (2018: 269) note: ' A key characteristic of modern "big data" is that the volume of stored data exceeds human analytic capacity and pushes against the boundaries of currently-available computing power.For that reason, the magnitude of "big" is continually growing.' By its principles, economic history research does not differ substantively from other types of historical research: economic historians compile data from original (archival) sources to provide answers to questions posed by scholars.What differs, though, is that the questions asked are often based on testable theoretical frameworks originating from social sciences and usually require a massive amount of data that, in turn, cannot be analysed without sorting the data into a database format, as well as by using some sort of quantitative methods.However, economic historians were forced to compile these types of datasets themselves for decades, whereas today there is a large amount of readymade data available, starting with various text corpuses (for example, digitised newspapers), statistical data provided by different national and international authorities (such as census records) and databases compiled by researchers, authorities and private enthusiasts in different fields, including genealogical associations.The latter type of 'citizen participation' or 'citizen science' to compile data will most likely increase in the future, as well as different kinds of official, linked register data.Nevertheless, even today, researchers studying especially the ancient and early modern eras are forced to mainly compile the datasets by themselves, whereas those concentrating on the more contemporary periods and topics have to face the challenges associated with the already existing datasets.
Using digitised sources is at the very core of international economic history.Computerised methods were embedded into the economic history research during the 'Cliometric Revolution' in the 1960s and 1970s, when the so-called 'historical economics' tradition emerged first in the United States, then also later in Europe.The first researchers in this tradition were mostly trained as economists-such as Alfred Conrad and John Meyer, and then Robert Fogel and Douglass C. North-using their theories, models and econometric methods to study and understand controversial topics in history, like the productivity and profitability of slavery.Obviously, mainstream historians were not totally convinced about their studies and methods, especially as some of the advocates of the 'new economic history' took historians head on vis-à-vis many big topics. 7By the turn of the millennium, this battle had settled down, as more historians have adopted cliometric methods to be a part of their toolkit and as 'social science history' has become more common.Simultaneously, economists are taking history research more seriously.Nevertheless, the major journals in economic history today are more oriented towards economics than they were back in the 1950s. 8he most obvious outcomes of the 'new economic history' have been the historical growth studies in different countries, compiled together in the Maddison Project database maintained at the Groningen University. 9Historical national account series and other long-run societal and economic time series form a basis for all comparative macroeconomic studies of history.These include data on population, prices, wages, structure of the economy (size of agriculture, industry and services), foreign and domestic trade, urbanisation, central (and local) government expenditures and, finally, GDP (per capita) that is based on all the other data series listed above.Historical national accounts have made comprehensive comparisons over long periods of time more credible between a growing number of countries.These datasets have been game changers in the field and have occupied a substantial role in the debates over long-run economic growth.Angus Maddison (2001) published his initial global growth figures spanning 2,000 years at the turn of the millennium, but he had already started putting these numbers out in various publications from the 1980s onwards.Obviously, his early figures were rather tentative, and the GDP per capita estimates in general for many developing states were too low.Recent efforts, for example, by Stephen Broadberry 10 and others, have exposed some of the flaws in these figures and extended our knowledge of not just European and Western development patterns, but also economic performance in Asia and Africa.These figures are now changing the debate over global trade and the socalled Great Divergence; that is, when and how China fell behind the West in the last 500 years. 11In recent studies, the focus has shifted to account for new areas of interest, such as well-being and inequality. 12Consequently, the existing Finnish historical national accounts from 1860 onwards were compiled by Riitta Hjerppe and the growth studies research group in the 1970s and 1980s, comprising 13 volumes in total, and they are still the benchmark in the study of Finnish economic history. 13usiness historians, in turn, have been more focused on actors and related activities in the economy, whether by private persons, entrepreneurs, business enterprises or other groups.These actors represent the 'visible hand' of the aggregate economic system.Research on these actors, in turn, helps us to understand the evolution of economic structures.By looking at the American 19 th -century railway companies, Alfred Chandler Jr. (1977) created the basic framework for the business strategy research.The methods used by modern business historians are more often qualitative, and the quantitative methods used are typically more descriptive than statistical ones. 14Nevertheless, big data and methods used to analyse digitised databases have become more important also for business historians.This is simply due to the fact that either the data produced by entrepreneurs and enterprises over time are in most cases in numerical form and/or the volume of data is massive. 15Even the early modern businessmen such as 13th-century Commenda traders in Genoa or late-18thcentury Finnish businessmen produced a massive amount of letters and ledgers; some of those have lately been converted into a digitised format.The recent historical business data is already of digital origin.The shift to increasingly digitised material has enabled researchers to utilise larger quantities of material in qualitative research in future studies, including new ways to collect and analyse the material, including the use of AI in data mining and analysis.

Use of Quantitative and Qualitative Methods to Tackle Digital Sources
The use and analysis of quantitative data has been a hallmark of economic history research, especially since the turn towards more quantitative economic history, as we have already discussed.The aim of this more economics-influenced research has often been to attempt to find causal relationships between different phenomena; namely, to measure what were the factors explaining changes in phenomena proxied by various time series, cross-sectional or panel data.For example, during the past decades, there have been many attempts to compile data on, better measure and understand the dynamics of pre-industrial economies; for instance, to clarify the role of women, children and families in the pre-and early-industrialising societies. 16Alongside the time series (or panels) of economic development, much attention has been placed on the study of equal or unequal distribution arising from this development. 17rom the 1950s onwards, econometric tools such as regression analysis have emerged as a typical way of estimating the relationships between economic variables.Regression analysis is today a common tool both in economics and social sciences, and also in economic history.Thus, in order to understand what has been written in the field during the past decades, one has to be familiar with at least the basics of this method; or, rather, the set of regression and other econometric techniques for modelling and analysing several variables.More commonly, regression analysis estimates the conditional expectation of the dependent variable vis-à-vis a set of independent variables; for example, what was the importance of education, investments or policy indicators for the economic growth or, as we have done, the effect of new technologies for wages of different skill levels of employees. 18ertain aspects of regression analysis have also been criticised, such as the over-reliance on measures of statistical significance. 19Historians are particularly worried how such methods are suited to the analysis of time series as the observable and unobservable factors might change over time, and also the sources of data are similarly subject to change.Some of the research has become perhaps even overly technical by nature, thus losing its relevance for broader historical narratives. 20Finally, causal relationships are hard to pinpoint, especially from more qualitative data, 21 and in econometrics the very idea that causality could be ascertained from regression analysis has become quite contested.
Another way to analyse causal relationships is by using counterfactual modelling: namely, to analyse a scenario of 'what if ' the phenomena had not have occurred or a different historical trajectory had taken place.Economic history also has a long tradition of counterfactual analysis, starting from the early writings of Nobel-prize-winning economic historian Robert Fogel.Those models have, however, been criticised time and again by historians. 22

Event Data Analysis
Although the methods used by economic historians could and should be criticised for certain shortcomings, they are nevertheless something that other historians might wish to emulate when using digitised, 'big data' sources.These methods can also be used when analysing qualitative, textual datasets, by introducing 'binary thinking' to the analysis; that is, coding the textual data to enable quantitative analysis.We have used, for example, 'event data analysis' to code actions and activities found in historical data, like the 'strategic actions' of companies.The basis for event data analysis can be found in historical events that are arranged according their sequences.The coding of events (for example, strategic actions) enables comparing different actors, such as companies or business groups. 23hile reducing texts to ones and zeroes might lead to over-simplifications, the use of more open methods, such as fuzzy set Qualitative Comparative Analysis (fs/QCA), has proved to be suitable for historical inquiries, as the settheoretic relations frequently reveal more plausible causal relations than simple correlations. 24Moreover, these types of methods can also be used to extrapolate larger datasets from smaller samples, in which typically statistical analysis has been near impossible.Often, the dichotomy between small-N qualitative case studies and large-N statistical studies has been overstated. 25Essentially, they follow the same underlying logic of research.The best way to avoid the pitfalls of each is to engage in both or combine the strengths of each approach.These types of methods have been further developed by some Finnish business history and management scholars in particular. 26n international comparisons, comparable data, contexts and how the data helps make broader points about processes all play a role.For Finnish historians, though, even the question of the relevance of comparisons might sometimes alter the way in which we think about the sources and data.One of our own examples is from some years ago when we were using a large-N database which comprised information on Finnish and Swedish sailors.Thus, an obvious perspective for us was to compare these two countries in our analyses.For readers outside Scandinavia, however, this did not make much sense: the reviewers and editors of journals saw Sweden and Finland rather as complementary than interesting comparative cases in terms of our research question, and the paper was rejected time and again, before we fully realised this challenge and changed the paper accordingly. 27his type of categorisation is something we have tried to develop further also in our bibliometric work focusing on analysing trends in business history scholarships.As categories of the contents of journal articles in the ready-made databases (such as WoS or Scopus) are always subjective, we introduced certain measures to make such categorisations more objective in our study.Obviously, these are again methods used previously in other fields, but ones that can also be adapted to the study of economic and business history debates.For example, we engaged several researchers to do categorisations of previously published business history articles simultaneously, and then either used 'consensus' or average categorisations, or results of 'voting' .In the latter case, the 'votes' (zeros or ones by each individual doing the categorisation) for each category were summed up, and thereafter these sums of votes were calculated as a percentage of the maximum possible number of votes.These percentages were then taken to be the share of each category and as basis for further statistical analysis, namely to study why certain business history articles received the most citations. 28The next obvious step is to introduce these bibliometric techniques to book-format publications, which would help us gauge the trends in a publication format that historians prefer, again broadening the analysis of interdisciplinary transference.

Making Big Data Work: Databases and Their Challenges
As we have shown here, economic and business historians have been engaged in creating their own databases for a long time by using a variety of primary sources. 29The data collected from the original primary source material has typically been stored as digital images, Word and Excel files on the researchers' own computers, and perhaps distributed via email or cloud services, when sharing was needed, for example, to make a common writing project easier.That is the case even today in many instances.Regardless, currently there is a growing number of ready-made databases that have to a certain extent eased the work of economic and business historians, yet at the same time they have provided new types of challenges.First of all, the availability of these databases has motivated researchers to study topics for which the data is (easily) available, and to find connections between those variables for which we have information.To study Finnish economic and business history, it might be challenging to use some of the international datasets, as information on Finland might be lacking, or is otherwise irrelevant or even incorrect.Some of the most important international databases, however, do have some data for Finland as well, like the Maddison Project database described above; Clio-Infra (http://www .clio-infra.eu/),EH-net databases (http://eh.net/databases/),Global Price and Income History (http://gpih.ucdavis.edu/)and Swedish historical monetary statistics 1668-2008 (http://www.riksbank.se/research/historicalstatistics).
The challenge is, however, that in many of these datasets the data on Finland is to a certain degree confusing and even misleading.This, in turn, relates to the fact that the data has been compiled from national statistical sources or from previous research.In the Finnish case, we simply still lack some of the basic research; thus, the datasets are using the existing figures for Finland.The Maddison database, for example, uses the growth figures for Finnish GDP (per capita), for certain benchmark years, for the last 2,000 years by using inter-and extrapolation methods.Nonetheless, Finnish growth studies have produced more exact figures so far only from the 1860s onwards.Currently, though, there is project at the University of Jyväskylä to fill the gap from the 1500s to mid-1850s in order to have more reliable, internationally comparable time series for Finland as well. 30This will, hopefully, make Finland more appealing as a unit to be used in international comparisons: currently, Finland is lacking from a number of international studies simply due to the fact that comparable data does not exist yet.Some of the international databases have been especially valuable also for Finnish economic and business historians.Beside those noted above, two specific datasets recently used by Finnish scholars are worth noting: the Soundtoll Registers Online (STRO) compilation (soundtoll.nl) and the Swedish Seamen's House enrolment database.
The STRO compilation is a good example of how digitised, large databases can be constructed with reasonable costs and in a limited amount of time. 31he STRO database is based on the archival data created in the Danish Elsinore in the Sound Toll that was established in the late 15th century and lasted until 1857.The STRO database includes roughly all the ships and their cargoes that passed the Danish Sound from 1634 to 1856, comprising 1.4 million ships.Of these ships, roughly 2.4%, that is 35,000 ships, came from or headed towards Finland.In order to understand Finnish international trade and shipping, the STRO is especially important as the Danish Sound was the only route for Finnish export and import trade to markets beyond the Baltic for centuries.The Baltic trade as a whole, in turn, was of utmost importance in understanding the early modern and modern growth of Europe, as this trade was, as Milja van Tielhof puts it, 'the mother of all trades' . 32The Danish Sound data used in previous research 33 was mainly based on the Sound Toll Tables compilation by Nina Ellinger Bang and Knud Korst in the 1920s and 1930s. 34Their data, though, covered only the period up until the early 1780s, and later Hans Christian Johansen extended the period up until the mid-1790s. 35Thus, from the Finnish perspective, the STRO is fascinating as it covers the era from the late 18th century until the mid-19th century, which was in many respects an emerging era for Finnish export trade and shipping.
Nevertheless, although the STRO data is highly valuable for research in general and for Finnish history research in particular, it also entails many challenges that can at the moment only be partly solved in the online dataset.The names of places and commodities are currently being made uniform, as well as the different units used (weights, sizes, etc.), and, moreover, there are a number of mistakes in the dataset that might have been present already when the entries of the original customs data were made or later during the data-entry process of the database.At the moment, there is an extensive project in Leipzig being overseen by Dr. Werner Scheltjens to modify the data further; this version, STRO 2.0, will be launched in the coming years.Finnish economic historians are also collaborating closely with this work in order to have even better data to use to study Finnish long-term trade patterns. 36nother important database used by Finnish economic historians is the Swedish Seamen's House enrolment dataset.This database was compiled at the turn of the millennium by the Swedish National Archives in collaboration with the Swedish Genealogical Association.The database includes roughly 650,000 enrolment cases and 26 million data points from nine Swedish coastal towns and one Finnish town (Kokkola).Researchers at the University of Jyväskylä gained full access to the database more than 10 years ago, only to find out that there were many challenges with the data.Indeed, the database is a good, or bad, example of the challenges inherent in these types of databases.
First, the researchers did not have full access to the data in the beginning, which made quantitative analysis impossible.Many similar genealogical databases have been designed to help users find detailed information on, for example, their ancestors-not to perform statistical analyses.Second, the data did not need to be exact in terms of values and figures to serve genealogical inquiries and, therefore, in the datasheets sometimes numeric and textual data became mixed.This all meant that it took almost a decade for the researchers first to clean up the data, enrich it with additional information, and then standardise the monetary and other units (especially tonnage measures of ships) before it could really be used.This led to the third challenge that, again, is unfortunately rather common in many research projects using ready-made digitised databases.Namely, the database used in the research is to some degree different from the one that is available online at the Swedish National Archives website, and the researchers cannot, in accordance with the signed contract, publish the data they are using.Thus, hopefully in the future, the Swedish National Archives will publish the modified dataset separately on their website; this would be helpful for the research community at large, as this database is certainly highly valuable and the results have already been published in some of the most notable publishing forums. 37There is already an initial agreement between the project researchers and the archive to publish the data in one form or another.

Digitising Business History
The magnitude of 'big' is also continually growing in the field of business history. 38In practice, qualitative researchers can utilise much larger volumes and types of data than before and, on the other hand, different tools of analysis.The major development trend of recent decades is the diversification of the research field.Although the mainstream debate is still focused on businesses, entrepreneurs and entrepreneurship, the perspectives of research have widened over the last decades to cover a broad range of business-related themes.For example, the importance of interest groups, entrepreneurship of women and minorities, developing economies and environmental issues as part of business practices have emerged as major topics of discussion. 39Even though most of the research is still being carried out in corporate archives, relying largely on textual material such as minutes and memos, it is because of the broadening of the scope of inquiry that the source material is quite sparse.
Finland's strength in business history research has traditionally been a comprehensive and open public archival service, which has guaranteed access to first-class material.One of the most important of these institutions is the National Archives (Kansallisarkisto), which has provided access to, among other things, abundant government documents, but also many archive collections of individuals and some private organisations.In Finland, the state has a strong position in society, and state documents, for example, contain not only information on legislation and administration, but also a huge variety of useful reports produced by various government organisations.The availability of sources has been supported by legislation under which a public authority document is in principle public. 40Moreover, this also covers state-owned enterprises, depending on their legal form of action.Such archives also cover very interesting research sites that are difficult to access in many other countries.The archives of the state-owned telecom company (PTL Tele/Sonera) are available to researchers until the year 1994, when the company changed its legal form from a public authority into a limited liability company.An even more important archive for Finnish business historical research is the Central Archives for Finnish Business Records (Elinkeinoelämän Keskusarkisto), where the archives of many Finnish companies are currently located and easily accessible to scholars.Often, such archives require a licence to use, which typically does not form an obstacle to academic research.For example, a large number of private telecoms documents are available up until the 2010s.
Despite the quality of the archive service, access to archival material and its quality are still key issues.For a private company, handing over the archive to the archival establishment is voluntary.The quality and usability vary on a case-by-case basis.At worst, even the material of important companies has been virtually lost.For example, when a large company, whose older archive sources are conveniently located in the National Archives, was asked about its late-1990s archives, it became clear that the company had outsourced the management of these archives to a private archive management company, which in turn had transferred the material to its own repositories.Worst of all, there was not even a list available for that material.On the other hand, the private archive management company does not provide any 'extra services' without an extra charge.Hence, to even find out whether the archives are relevant for scientific use would require a laborious and costly preliminary inquiry.On the other hand, some companies have already digitised their archives.However, even if the material is in digital format, there is no guarantee that it will be accompanied by a proper search engine and metadata, or that the archive would be properly organised and/or that the researcher would have full access to the database.
Business history has a long tradition of using digital images and optical character recognition (OCR) techniques, similar to economic history.In this way, scholars themselves have digitised a considerable amount of material.These have already greatly accelerated the utilisation of broader amounts of information.These are mostly private, rather limited databases.When talking about the possibilities of these images and personal collections, it should be borne in mind that these are not usually complete sets.A business history scholar, rarely paid for their efforts in this regard, usually has to photograph only the 'necessary' documents.For this reason, these private collections usually serve specific research questions.It is clear that large-scale digitisation of the material should be done by archives or large, well-funded projects in a professional and systematic way, leading to a publication of the data in a commonly used form.Unfortunately, the digitisation projects of aforementioned key Finnish institutions are still only in their infancy.Digitisation has, first and foremost, captured the oldest material.On the other hand, new machine reading technologies are promising and will surely improve the usability of data in the future.Up until this point, very positive developments have taken place vis-à-vis search engines, making it easier for the researcher to find material from traditional archives.
Discussion of the business history method has touched upon the usability of the history research method in social sciences (such as organisational research), and how business historians can contribute to these discussions.Qualitative history research that takes into account temporal processes, contexts and coincidences has also been seen to be instrumental in building and modifying theoretical understanding. 41owever, defining the method of historical research has become a problem: instead of a clearly defined method and source series, qualitative history research often takes advantage of different perspectives and sets of sources that may change as the research process evolves.The problem arises because historians are not accustomed to describing these research processes with the precision that is customary in social sciences, which in turn has begun to take replicability seriously.This debate has highlighted the need for business historians to pay more attention to describing their methods. 42This requirement can also be viewed against the development of digital analytical methods.Since the idea of such methods is to automate the work, this requires event data coding in different ways, which in turn requires precision as well as continuous justification of choices.In this way, methodological precision and connection to theoretical models will be a more central part of the historian's daily routine.
Digitisation of business history sources and methods allows not only the use of qualitative data in larger quantities, but also the more intensive research collaboration.A particularly interesting example of using digital methods in business history research pertains to the 'Digital History of Telco and Exchanges in Finland and Sweden' consortium. 43The project includes researchers, social scientists and historians from Aalto University and several Swedish universities, including the Stockholm School of Economics.Moreover, one of the authors of this chapter has participated in this collaboration.At the heart of this 'DigHist' project is a database, which includes the digital business archives of four business enterprises.Two telecommunications companies and two exchanges from Sweden and Finland have been selected for the project.These archives have been digitised for their most relevant parts.The coded digitised material is shared between the members of the consortium.For example, the database contains key sections of the Finnish state-owned telecom company's (PTL Tele/ Sonera) archives (95 digitised archive boxes).Some of these have been digitised from the collections of the Finnish National Archives, but the others have been digitised from the material held by the current company Telia.Consequently, in one project, we were able to perform searches on all of the 764 Executive Team meetings (including attachments) that took place between 1981 and 1998.The software used also allows for the indexing of material and linking different documents to each other.Materials related to an interesting event can be assembled into a set of materials that make it easy to view relevant documents together.The sheer amount of data and the search functions make it possible to efficiently compile information on the desired topics.
At best, this type of working method enables quantitative exploitation of qualitative material and analysis.In addition, by working closely together, the project scholars have been able to develop a unique research design centred around collaboration across institutions and disciplines.Data availability, a common desktop and teamwork enable a highly effective and accurate research process combining different areas of expertise.Practical experience has shown that such a method also poses challenges.Finding information about a huge amount of data requires good knowledge of the case and the materials.To know what kind of potentially interesting things have happened in the company being researched (namely, the terms used in the company at different times), it is important that someone in the research group is knowledgeable about the subject and sources of the case.Again, easy and partially mechanical availability of the material may blind the scholar.Too narrow a focus on certain source series and 'relevant' documents may obscure the importance of the historical process and context, leaving the strengths of historical research untapped.In any case, such a way of working has proved to be a promising way of combining digital tools and theoretical knowledge with methods of historical research. 44

Discussion and Further Challenges
Digitisation is part of the development of technology and society, and hence something that naturally enhances economic and business history research.Its direct impact is related to the available material, the amount and usability of which are greatly improved as digitisation proceeds further.In many cases, such as in business archives, digitisation could potentially proceed much faster, but such efforts have been hampered by the lack of funding and expertise.Although digitising research materials and methods does not bring anything other than more efficient tools for managing the research process, at its best it can also be used as a tool for speeding up and facilitating the development of methodology and science, as well as international collaborations.
For some, digitisation itself changes human and social sciences.At the heart of such 'Google of archives' thinking are massive increases in the amount of data and improvements in search functions.According to Berry and Fagerjord  (2017), up until now, digitisation in human sciences can almost completely be understood as a mechanism for sorting availability and dissemination of material in large quantities.The discussion has dealt with technical issues that are considered to be part of the archive's or library's work.In fact, even the tools are not new in principle.As we have discussed here, databases and quantitative methods have been used for a long time, and even before PCs became available.Economic history has been a forerunner in this and the lessons learned by economic and business historians-both successes and failures-could and should, we argue, be used also more broadly among other historians.As Berry and Fagerjord conclude, the actual contribution of digitisation has to 'move beyond the purely instrumental and mechanical automation of processing of humanities materials' . 45xcessive and straightforward trust in digitisation is methodologically problematic.Using, for example, keywords, the desired documents can be found quickly from a large cache of data, yet a poor choice of keywords can lead the scholar to miss key contributions.Moreover, context and other areas are easily missed.The same applies to quantitative research, in which the researcher still needs to understand what dimensions and weights meant at different times and contexts.In that case, the researcher may unknowingly twist the history of an event in a way to reinforce his or her own hypotheses. 46In reality, the need for someone to know the empirical material thoroughly does not disappear with the new digital collections and large-N methods.This is also an important starting point in all historical studies using, for example, regressions analysis: you need to know the units you are analysing before you analyse them, and what information you might still be missing from your analysis.
Furthermore, the sheer amount of information is a methodological problem, because the researcher needs to separate the necessary pieces from a massive amount of data.The committee which explored options for developing Finnish state-owned businesses published a 154-page report in 1985.However, the same committee delivered into the National Archives material that takes up about two shelf metres.Most of these consisted of unorganised documents, which contained numerous versions of the same memoirs, meeting invitations and drafts. 47If this material were to be digitised and searched for, in practice, the same document would appear among the results tens of times, but only a few documents would increase our knowledge of the subject itself.We had a similar challenge with the MetaSignal project: using the large database containing news agency newsfeeds actually delivered the same information in the worst cases dozens of times.On the one hand, we could use this information to show the 'hype' around various topics, but on the other, it hindered the possibilities of performing proper quantitative analysis.
The creation and use of large digital collections require collaboration between several state and private actors.In the Finnish case, a specific role is played by the Finnish National Archives, which is responsible for the official documents created by the different state authorities.The Central Archives for Finnish Business Records (Elka), in turn, is the most important institution vis-à-vis private business archives and collections.The official collections (and to a certain extent also private archives) can be divided into roughly three groups, each entailing some specific challenges in today's digital world.
The first group consist of the 'old' paper archives, the total volume of which today is roughly 220 shelf-kilometres at the Finnish National Archives.Only a small fraction of this 'old' archival data has been or will ever be digitised; today, there are already 85 million digitised pictures available at the National Archives.
The goal is to digitise 20 per cent from this old material; mainly archives from the 1920s onwards.Nevertheless, the bulk of this material will also remain in paper format in the future.
Second, a large amount of paper format official documents resides with different state authorities that have been created since the 1970s, during the era of bureaucratisation, which are to be moved to National Archives in the coming years.The volume of these documents is around 135 shelf-kilometres.i] On the whole, this will mean that future historians who are looking for official documentation from 1970 onwards have to contend with only digitised archives.To a certain degree, the same is occurring in the private sector as well.
The third challenge relates to the so-called born-digital documents.The new service acquire for born-digital material is to be launched in year 2021; moreover a pilot project for private archives was under construction in 2020.Whatever the archival solution will be, both regarding public and private documents, the format will be digital.Thus, future historians will definitely need versatile skills to be able to use the digitised data and archives effectively.
In general, historical sciences have been the pioneers in the utilisation of information technology since the 1970s.Since then, digitisation has inevitably progressed, but as we have noticed, often slowly and sporadically.However, the advancement of digitisation has embodied undeniable advantages.Economic history at large has been forging ahead of other fields of history in using big data, digitised sources and quantitative methods.By using the rhetoric embedded in the theoretical debates of the discipline itself, economic history might eventually lose its comparative advantage as other, larger fields of history are catching up in using these data and methods, which in itself would be a good turn for debates about big issues in history, such as trade, slavery, development, environment, conflicts and so on.
There is a plethora of other challenges ahead for the field of history too, as well as economic and business history in particular.First, what materials should be digitised and when?This reflects the priorities among the scholars and the institutions that produce and maintain such records.Often, those priorities are not the same, which can create friction among the stakeholders.It also concerns resources and technologies available to facilitate such processes.Second, who has access and to what?While many archival collections are open access, some are not.And most published articles and books are not open access, which limits their use among scholars who are not institutionally linked and especially those who are located outside Western academic institutions.The same, of course, applies to the first concern about what materials are digitised; namely, for example, are business and economic records from the developing world less accessible those from the West?Third, new methods are emerging to analyse both the data itself as well as research trends, including bibliometric and AI methods.These methods can offer great insights, but they can also be used to direct funds towards the most 'popular' types of research at the expense of, at least in terms of perception, more marginal topics.This can foster groupthink and could be detrimental to smaller, interdisciplinary fields like economic and business history.Finally, there are great challenges among the various fields of history to remaster quantitative techniques to be able to make use of the new 'big data' , given that the so-called cultural turn from the 1980s until the early 2000s had no real interest in quantitative analysis and that the 'Cliometric Revolution' often took economic historians to departments of economics.Now, there is a greater demand to bring back quantitative historians, who have the requisite skill set to work with these types of data and methods.However, to achieve that, humanities will have to compete with other fields, with higher wages and better resources, so this process will likely take some time.
. . .As stated at the outset of this chapter, our MetaSignal project failed 15 years ago.Would it be possible to create such an AI with historical data to predict future today?A number of similar software solutions have already been created, using various kinds of data sources.However, with similar sources and algorithms that we were using, it is highly unlikely that the project would succeed even with today's computational power.Moreover, although digital methods in humanities and social sciences have developed significantly over the past 15 years, the use of these methods is still lacking behind the digitisation of sources.Nevertheless, it would certainly be beneficial to have historians on board to develop similar kinds of projects also in the future.AI methods are certainly already in use to deal with large datasets and analytical projects, and eventually they will become the cornerstones of historical analysis more broadly, although historians will have to exercise careful control over these efforts and remember the points of caution we have reflected on in this chapter.