BACKGROUNDPrimary care databases provide a unique resource for healthcare research, but most researchers currently use only the Read codes for their studies, ignoring information in the free text, which is much harder to access.OBJECTIVESTo investigate how much information on ovarian cancer diagnosis is 'hidden' in the free text and the time lag between a diagnosis being described in the text or in a hospital letter and the patient being given a Read code for that diagnosis.DESIGNAnonymised free text records from the General Practice Research Database of 344 women with a Read code indicating ovarian cancer between 1 June 2002 and 31 May 2007 were used to compare the date at which the diagnosis was first coded with the date at which the diagnosis was recorded in the free text. Free text relating to a diagnosis was identified (a) from the date of coded diagnosis and (b) by searching for words relating to the ovary.RESULTS90% of cases had information relating to their ovary in the free text. 45% had text indicating a definite diagnosis of ovarian cancer. 22% had text confirming a diagnosis before the coded date; 10% over 4 weeks previously. Four patients did not have ovarian cancer and 10% had only ambiguous or suspected diagnoses associated with the ovarian cancer code.CONCLUSIONSThere was a vast amount of extra information relating to diagnoses in the free text. Although in most cases text confirmed the coded diagnosis, it also showed that in some cases GPs do not code a definite diagnosis on the date that it is confirmed. For diseases which rely on hospital consultants for diagnosis, free text (particularly letters) is invaluable for accurate dating of diagnosis and referrals and also for identifying misclassified cases.