people.eng.unimelb.edu.au

People.eng.unimelb.edu.au

Focused crawling in depression portal search: A feasibility study
Abstract
deal of depression information on the Web is of poor search services in the area of depressive illness quality when judged against the best available scientific has documented the significant human cost required to evidence [8, 10]. It is thus important that consumers can setup and maintain closed-crawl parameters. It also locate depression information which is both relevant showed that domain coverage is much less than that of whole-of-web search engines. Here we report on the Recently, in [15], we compared examples of two feasibility of techniques for achieving greater coverage types of search tool which can be used for locating at lower cost. We found that acceptably effective crawl depression information: whole-of-Web search engines parameters could be automatically derived from a such as Google, and domain-specific (portal) search DMOZ depression category list, with dramatic saving services which include only selected sites. We found in effort. We also found evidence that focused crawling that coverage of depression information was much could be effective in this domain: relevant documents greater in Google than in portals devoted to depression from diverse sources are extensively interlinked; many outgoing links from a constrained crawl based on BluePages Search (BPS)1 is a depression-specific DMOZ lead to additional relevant content; and we search service offered as part of the BluePages depres- were able to achieve reasonable precision (88%) and sion information site. Its index was built by manu- recall (68%) using a J48-derived predictive classifier ally identifying and crawling areas on 207 Web servers operating only on URL words, anchor text and text containing depression information. It took about two content adjacent to referring links. Future directions weeks of intensive human effort to identify these areas include implementing and evaluating a focused (seed URLs) and define their extent by means of include crawler. Furthermore, the quality of information in and exclude patterns. Similar effort would be required returned pages (measured in accordance with the at regular intervals to maintain coverage and accuracy.
evidence based medicine) is vital when searchers are Despite this human effort, only about 17% of relevant consumers. Accordingly, automatic estimation of web pages returned by Google were contained in the BPS site quality and its possible incorporation in a focused crawler is the subject of a separate concurrent study. One might conclude from this that the best way to provide depression-portal search would be to add the Keywords
focused crawler, hypertext classification, word ’depression’ to all queries and forward them to mental health, depression, domain-specific search.
a general search engine such as Google. However, inother experiments in [15] relating to quality of infor- Introduction
mation in search results, we showed that substantial Depression is a major public health problem, being a amounts of the additional relevant information returned leading cause of disease burden [13] and the leading by Google was of low quality and not in accord with risk factor for suicide. Recent research has demon- best available scientific evidence. The operators of the strated that high quality web-based depression infor- BluePages portal (ANU’s Centre for Mental Health Re- mation can improve public knowledge about depres- search) were keen to know if it would be feasible to sion and is associated with a reduction in depressive provide a portal search service featuring: symptoms [6]. Thus, the Web is a potentially valuable 1. increased coverage of high-quality depression in- resource for people with depression. However, a great Proceedings of the 9th Australasian Document Computing
Symposium, Melbourne, Australia, December 13, 2004.
Copyright for this article remains with the authors.
2. reduced coverage of dubious, misleading or un- and out-neighbours (documents that target document cites) as input to some classifiers.
Our work also used link information. We tried to 3. significantly reduced human cost to maintain the predict the relevance of uncrawled URLs using three features: anchor text, text around the link and URL We have attempted to answer the questions in two parts. Here we attempt to determine whether it is fea-sible to reduce human effort by using a directory of Resources
depression sites maintained by others as a seedlist and This section describes the resources used in our exper- using focused crawling techniques to avoid the need iments: the BluePages search service; the data from to define include and exclude rules. We also investi- our previous domain-specific search experiments; the gate whether the content of a constrained crawl links DMOZ depression directory listing and the WEKA ma- to significant amounts of additional depression content and whether it is possible to tell which links lead todepression content.
BluePages Search
A separate project is under way to determine BluePages Search (BPS) is a search service offered as whether it is feasible to evaluate the quality of part of the existing BluePages depression information depression sites using automatic means.
site. Crawling, indexing and search were performed by reported elsewhere. If the outcomes of both projects are favourable, the end-result may be a focused crawler The list of web sites that made up the BPS was man- capable of preferentially crawling relevant content from ually identified from the Yahoo! Directory and from querying general search engines using the query term’depression’. Each URL from this list was then exam- Focused crawling - related work
ined to find out if it was relevant to depression before it Focused crawlers, first described by de Bra et al. [2], for was selected. The fencing of web site boundaries was a crawling a topic-focused set of Web pages, have been much bigger issue. A lot of human effort was needed to frequently studied [3, 1, 5, 9, 12].
examine all the links in each web site to decide which A focused crawler seeks, acquires, indexes, and links should be included and excluded. Areas of 207 maintains pages on a specific set of topics that represent web sites were selected. These areas sometimes in- a relatively small portion of the Web. Focused crawlers cluded a whole web server, sometimes a subtree of a require much smaller investment in hardware and web server and sometimes only some individual pages.
network resources but may achieve high coverage at a Newspaper articles (which tend to be archived after a short time), potentially distressing, offensive or destruc- A focused crawler starts with a seed list which con- tive materials and dead links were excluded during the tains URLs that are relevant to the topic of interest, it crawls these URLs and then follows the links from A simple example of seeds and boundaries is: these pages to identify the most promising links based • seed = www.counselingdepression.com/, and on both the content of the source pages and the linkstructure of the web [3]. Several studies have used sim- • include patterns = www.counselingdepression.
ple string matching of these features to decide if the next link is worth following [1, 5, 9]. Others used re- In this case, every link within this web site is included.
inforcement learning to build domain-specific search In complicated cases, however, some areas should be engines from similar features. For example, McCallum included while others are excluded. For instance, ex- et al. [11] used Naive Bayes classifiers to classify hy- amining www.drada.org would result in the following perlinks based on both the full text of the sources and anchor text on the links pointing to the targets.
A focused crawler should be able to decide if a page is worth visiting before actually visiting it. This raises the general problem of hypertext classification.
In traditional text classification, the classifier looks only at the text in each document when deciding what Hypertext classification is different because it tries to classify documents without the need for the content The above boundaries mean that everything within the of the document itself. Instead, it uses link information.
web site should be crawled except for pages about bipo- Chakrabati et al. [3] used the hypertext graph including in-neighbours (documents citing the target document) Data from our previous work
allows us to leverage off the categorisation work beingdone by volunteer editors.
In our previous work, we conducted a standardinformation DMOZ seed generation
’depression’ queries against six engines of differenttypes: two health portals, two depression-specific We started from the ’depression’ directory on the search engines, one general search engine and one general search engine where the word ’depression’ was added to each query if not already present (GoogleD).
Depression/. This directory is intended to contain We then pooled the results for each query and employed links to relevant sites and subsites about depression.
research assistants to judge them. We obtained 2778 The directory, however, also had a small list of 12 judged URLs and 1575 relevant URLs from all the within-site links to other directories, which may or engines. We used these URLs as a base in the present only needed to do some minor boundary selection We found that, over 101 queries, GoogleD returned for these links to include relevant directories.
more relevant results than those of the domain-specific example, the following directories were included because they are related to depression and they are while 683 relevant results were retrieved by GoogleD.
As GoogleD was the best performer in obtaining the most relevant results, we also used it as a base engine to compare with other collections in the present work.
Medications/Antidepressants/. These links were selected simply because their URLs contain the term’depression’ (such as childhood_depression) or DMOZ3 is the Open Directory Project which is “the ’antidepressants’. The seed URLs, as a result, included largest, most comprehensive human-edited directory of the above links and all the links to depression-related the Web. It is constructed and maintained by a vast, sites and subsites from this directory.
global community of volunteer editors”4.
Include patterns corresponding to the seed URLs We started with the Depression directory5 which were generated automatically. In general, the include pattern was the same as the URL, except that default page suffixes such as index.htm were removed. Thus, if the URL referenced the default page of a server orweb directory, the whole server or whole directory was Weka6 was developed at the University of Waikato in included. If the link was to an individual page, only that New Zealand [16]. It is a data mining package which contains machine learning algorithms. Weka provides The manual effort required to identify the seed tools for data pre-processing, classification, regression, URLs and define their extent varied greatly between clustering, association rules, and visualization. Weka BPS and DMOZ. While it took about two weeks of was used in our experiments for the prediction of URL intensive effort in the BPS case, it only required about relevance using hypertext features. It was used because it provided many classifiers, was easy to use and servedour purposes well in predicting URL relevance.
Comparison of the DMOZ collection
and the BPS collection
Experiment 1 - Usefulness of a DMOZ
This experiment aimed to find out if a constrained crawl category as a seed list
from the low-cost DMOZ seed list can lead to domain A focused crawler needs a good seed list of relevant coverage comparable to that of the manually configured URLs as a starting point for the crawl. These URLs should span a variety of web site types so that After identifying the DMOZ seed list and include the crawler can explore the Web in many different patterns as described above, we used the Panoptic directions. Instead of using a manually created list, we crawler to build our DMOZ collection. We then ran the attempted to derive a seed list from a publicly available 101 queries from our previous study and obtained 779 directory - DMOZ. Because depression sites on the web are widely scattered, the diversity of content in We attempted to judge the relevance of these results DMOZ is expected to improve coverage. Using DMOZ using the 1575 known relevant URLs (see Section 3.2)and to compare the DMOZ results with those of the Table 1 shows that 186 out of 227 judged URLs (a http://www.dmoz.org/Health/Mental_Health/ pleasing 81%) from the DMOZ collection were rele- vant. However, the percentage of judged results (30%) Table 1: Comparison of relevant URLs in DMOZ andBPS results of running 101 queries.
was too low to allow us to validly conclude that DMOZwas a good collection.
Since we no longer had access to the services of the judges from the original study we attempted to confirm that a reasonable proportion of the unjudged documentswere relevant to the general topic of depression by sam-pling URLs and judging them ourselves.
We randomly selected 2 lists of 50 non-overlapped Figure 1: Illustration of one link away collection from URLs among the unjudged results and made relevance judgments on these. In the first list, we obtained 35relevant results and in the second list, 34 URLs were relevant. Because there was close agreement betweenthe proportion relevant in each list we were confident • the BPS outgoing link set containing all URLs that we could extrapolate the results to give a reasonable estimate of the total number of relevant pages returned.
• 2 sets of judged-relevant URLs: BPS relevant and Extrapolation suggests 381 relevant URLs for the able to obtain 567 (186 + 381) relevant URLs from Our previous work concluded that BPS didn’t re- the DMOZ set. This number was not as high as that trieve as many relevant documents as GoogleD because of BPS, but it was relatively high (72% relevant URLs of its small coverage of sites. We wanted to find out if in DMOZ set compared to 91% of these in BPS).
focused crawling techniques have the potential to raise Therefore, we could conclude that the DMOZ list is an BPS performance by crawling one step away from BPS.
acceptably good, low-maintenance starting point for a Among 954 relevant pages retrieved by all engines ex- cept for BPS, BPS failed to index 775 pages. The ex-tended crawl yielded 196 of these 775 pages or 25.3%.
Experiments 2A-2C - Additional link-
In other words, an unrestricted crawler starting from accessible relevant information
the original BPS crawl would be able to reach an addi-tional 25.3% of the known relevant pages, in only a sin- Although some focused crawlers can look a few links gle step from the existing pages. In fact, the true num- ahead to predict relevant links at some distance from the ber of additional relevant pages is likely to be higher currently crawled URLs [7], the immediate outgoing because of the large number of unjudged pages.
links are of most immediate interest.
It is unclear whether the additional relevant content We performed three experiments to gauge how in the extended BPS crawl would enable more relevant much additional relevant information is accessible one documents to be retrieved than in the case of GoogleD.
link away from the existing crawled content.
Retrieval performance depends upon the effectiveness additional relevant content is linked to from pages in of the ranking algorithm as well as on coverage.
the original crawl, the prospects of successful focusedcrawling are very low. Figure 1 shows an illustration of Experiment 2B: Comparison of out-
the one-link-away set of URLs from the DMOZ crawl.
going links between BPS and DMOZ
The first experiment (2A) involved testing if outgo- ing links from the BPS collection were relevant while This experiment compared the out-going link sets of the second (2B) compared the outgoing link sets of BPS BPS and DMOZ to find out if the DMOZ seed list could and DMOZ to see if DMOZ was really a good place to be used instead of the BPS seed list to guide a focused lead a focused crawler to additional relevant content.
crawler to relevant areas of the web. The following data The last experiment (2C) attempted to find out if URLs relevant to a particular topic linked to each other.
• 2 sets of out-going links from the BPS and DMOZ Experiment 2A: Outgoing links from
the BPS collection
• 2 sets of all judged URLs and judged-relevant The data used for this experiment included: Collection of URLs for training and
Table 2: Comparison of relevant out-going link URLs For both BPS and DMOZ crawls, we collected all immediate outgoing URLs satisfying the followingtwo conditions (1) known relevant or known irrelevant URLs and (2) the URLs pointing to each of these URLs were also relevant. We collected 295 relevant and 251irrelevant URLs for our classification experiment.
From our previous work, we obtained 2778 judged URLs which were used here as a base to compare rele- Features
vance. Table 2 shows that even though the outgoing link Several papers in the field used the content of crawled collection of DMOZ was more than double the size of URLs, anchor text, URL structure and other link graph that of BPS, more outgoing BPS pages were judged.
information to predict the relevance of the next unvis- Among the judged pages, BPS and DMOZ had 196 ited URLs [1, 5, 9]. Instead of looking at the con- and 158 relevant pages respectively in their outgoing tent of the whole document pointing to the target URL, link sets. Although DMOZ had less known relevant Chakrabarti [4] used 50 characters before and after a pages than BPS, the proportion of relevant pages versus link and suggested that this method was more effective.
judged pages were quite similar for both engines(78% Our work was somewhat related to all of the above. We for DMOZ and 79% for BPS). This result together with used the following features to predict the relevance of the size of each outgoing link collection implied that (1) The DMOZ outgoing link set contained quite a largenumber of relevant URLs which could potentially be • anchor text on the source pages: all the text ap- accessed by a focused crawler, and (2) The DMOZ seed pearing on the links to the target page from the list could lead to much better coverage than the BPS • text around the link: 50 characters before and 50 Experiment 2C: Linking patterns be-
characters after the link to the target page from the tween relevant pages
We performed a very similar experiment to the experi- • URL words: words appearing in the URL of the ment described in Section 5.1, with the purpose of find- ing out if relevant URLs on the same topic are linked toeach other. Instead of using the whole BPS collection We accumulated all words for each of these features to of 12,177 documents as the seed list, we only chose the form 3 vocabularies where all stop words were elimi- 621 known relevant URLs. The following data were nated. URL words separated by a comma, a full stop, a special character and a slash were parsed and treatedas individual words. URL extensions such as .html, .asp,.htm,.php were also eliminated. The end result • the BPS outgoing link set from the above, con- showed 1,774 distinct words in the anchor text vocab- taining all URLs linked to by BPS known relevant ulary, 874 distinct words in the URL vocabulary, and 1103 distinct words in the content vocabulary.
For purposes of illustration , Table 3 shows the fea- • judged-relevant URLs from our previous work.
tures extracted from each of six links to the same URL.
The outgoing link collection of the BPS known rel- Assume that we would like to predict www.ndmda.
evant URLs contained 5623 URLs. Of these, 158 were org for its relevance to depression and that we have known relevant. This was a very high number com- six already-crawled pages pointing to it from our pared to the 196 known relevant URLs obtained from crawled collection. From each of the pages, features the much bigger set of all outgoing link URLs (contain- are extracted in the form of anchor text words and the ing above 40,000 URLs) in the previous experiment. It words within a range of a maximum of 50 characters is likely from this experiment that relevant pages tend before and after the link pointing to www.ndmda.
to link to each other. This is good evidence supporting the feasibity of the focused crawling approach.
the target URL because that URL contains only stop Experiment 3 - Hypertext classification
words and/or numbers which have been stripped off.
The URL words for the target URL after being parsed After downloading the content of the seed URLs and extracting links from them, a focused crawler needs to decide what links to follow and in what order based 7We first extracted the 50-character string and then eliminated on the information it has available. We used hypertext markup and stopwords, sometimes leaving only a few words.
Table 3: Features for www.ndmda.org after removing stop words and numbers.
Target URL: www.ndmda.org
source URL
anchor text
content around the link
depression, bipolar, support,alliance,american, psychiatric depression, bipolar, support,alliance,american, psychiatric Classifier
Description
Zero rule. Predicts the majority class. Used as a baseline.
Statistical method. Assumes independence of attributes. Usesconditional probability and Bayes rule.
Class for building and using a Complement class Naive Bayes classifier.
C4.5 algorithm. A decision tree learner with pruning.
Class for bagging a classifier to reduce variance.
Class for boosting a nominal class classifier using the Adaboost M1 method.
Classifiers
By this means we obtained a list of URLs, each associated with the tf .idf s for all terms in the 3 vocab- We compared a range of classification algorithms pro- ularies. A learning algorithm was then run in Weka to learn and predict if these URLs were relevant or irrele- When training and testing the collection, we used vant. We also used boosting and bagging algorithms to a stratified cross-validation method, i.e. using 10-fold boost the performance of different classifiers.
cross validation where one tenth of the collection wasused for training and the rest was used for testing and Measures
the operation was repeated 10 times. The results werethen averaged and a confusion matrix was drawn to find We used three measures to analyse how a classifier per- formed in categorizing all the URLs. We denoted truepositive and true negative for the relevant and irrelevant Input data
URLs that were correctly predicted by the classifier re-spectively. Similarly, false positive and false negative We treated the three vocabularies containing all features were used for irrelevant and relevant URLs that were incorrectly predicted respectively. The three measures frequency and inverse document frequency (tf .idf ) for each feature attached to each of the URLs specified inSection 6.1 using the following formula [14].
• Accuracy: shows how accurately URLs are classi- where t is a term, d is a document, tf (t , d ) is the fre- • Precision: shows the proportion of correctly rele- quency of t in d, n is total number of documents and vant URLs out of all the URLs that were predicted df (t ) is the number of documents containing t.
• Recall: shows the proportion of relevant URLs did reducing the feature set using a feature selection that were correctly predicted out of all the relevant Conclusions and future work
Although accuracy is an important measure, a focused Weeks of human effort were required to set up the cur- crawler would be more interested in following the links rent BPS depression portal search service and consider- from the predicted relevant set to crawl other potentially able ongoing effort is needed to maintain its coverage relevant pages. Thus, precision and recall are better and accuracy. Our investigations of the viability of a focused crawling alternative have resulted in three keyfindings.
Results and discussion
First, web pages on the topic of depression are The results of some representative classifiers are shown strongly interlinked despite the heterogeneity of in Table 5. ZeroR represented a realistic performance “floor” as it classified all URLs into the largest cate- literature for other topic domains and provides a good gory i.e relevant. As expected, it was the least accurate.
foundation for focused crawling in the depression Naive Bayes and J48 performed best. Naive Bayes was domain. The one-link away extensions to the closed slightly better than J48 on recall but the latter was much BPS and DMOZ crawls contained many relevant pages.
better in obtaining higher accuracy and precision. Out Second, although somewhat inferior to the expen- of 228 URLs that J48 predicted as relevant, 201 were sively constructed BPS alternative, the DMOZ depres- correct (88.15%). However, out of the 264 URLs pre- sion category features a diversity of sources and seems dicted as relevant by Naive Bayes, only 206 (78.03%) to provide a seed list of adequate quality for a focused were correct. Overall, the J48 algorithm was the best crawl in the depression domain. This is very good news performer among all the classifers used.
for the maintainability of the portal search because of We found that bagging did not improve the classifi- the very considerable labour savings. Other DMOZ cation result while boosting showed some improvement categories may provide good starting points for other for recall (from 64.74% to 68.13%) when the J48 algo- Third, predictive classification of outgoing links We also performed other experiments where only into relevant and irrelevant categories using source- one set of features or any combination of two sets of page features such as anchor text, content around the features were used. In all cases, we observed that the link and URL words of the target pages, achieved accuracy, precision and recall were all worse than when all three sets of features were combined.
algorithm, as implemented by Weka, we obtained high Our best results, as detailed in Table 5, showed that accuracy, high precision and relatively high recall.
a focused crawler starting from a set of relevant URLs, Given the promise of the approach, there is obvious and using J48 in predicting future URLs, could obtain a follow-up work to be done on designing and building precision of 88% and a recall of 68% using the features a domain-specific search portal using focused crawling techniques. In particular, it may be beneficial to rank We wished to compare these performance levels the URLs classified as relevant in the order of degree with the state of the art, but were unable to find in the of relevance so that a focused crawler can decide on literature any applicable results relating to the topic visiting priorities. Also, appropriate data structures are of depression. We therefore decided to compare our needed to hold accumulated information for unvisited predictive classifier with a more conventional content URLs (i.e. anchor text and nearby content for each referring link.) This information needs to be updated We built a ’content classifier’ for ’depression’, using as additional links to the same target are encountered.
only the content of the target documents instead of the Another important question will be how to persuade features being used in our experiment. The best ac- Weka to output a classifier that can be easily plugged- curacies obtained from the two classification systems in into the focused crawler’s architecture. Since the best were very similar, 78% for the content classifier and performing classifier in these trials was a decision tree, 77.8% for the predictive version. Content classification showed slightly worse precision but better recall.
Once a focused crawler is constructed, it will be We concluded from this comparison that hypertext necessary to determine how to use it operationally. We classification is quite effective in predicting the rele- envisage operating without any include or exclude rules vance of uncrawled URLs. This is quite pleasing as but will need to decide on appropriate stopping condi- a lot of unnecessary crawling can be avoided.
tions. If none of the outgoing links are classified as Finally we explored two variant methods for fea- likely to lead to relevant content, should the crawl stop, ture selection. We found that generating features using or should some unpromising links be followed? And stemmed words caused a reduction in performance, as Classifier
Accuracy (%)
Precision (%)
Recall(%)
Because of the requirements of the depression por- [5] J. Cho, H. Garcia-Molina and L. Page. Efficient crawl- tal operators site quality must be taken into account in ing through url ordering. In Proceeding of the Seventh building the portal search service. Ideally, the focused World Wide Web Conference, 1998.
crawler should take site quality into account when de- [6] H. Christensen, K. M. Griffiths and A. F. Jorm. Deliver- ciding whether to follow an outgoing link, but this may ing Interventions for Depression by Using the Internet: or may not be feasible. Another more expensive alter- Randomised Controlled Trial. British Medical Journal, native would be to crawl using relevance as the sole Volume 328, Number 7434, pages 265–0, 2004.
criterion and to filter the results based on quality.
[7] M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles Site quality estimation is the subject of a separate and M. Gori. Focused crawling using context graphs. In study, yet to be completed. In the meantime, it seems Proceeding of the 26th VLDB Conference, Cairo, Egypt, fairly clear from our experiments that it will be possible to increase coverage of the depression domain for dra- [8] Berland G, Elliott M, Morales L, Algazy J, Kravitz matically lower cost by starting from a DMOZ category R, Broder M, Kanouse D, Munoz J, Puyol J, Lara M, Watkins K, Yang H and McGlynn E. Health Information Verifying whether techniques found useful in this on the Internet: Accessibility, Quality, and Readability project also extend to other domains is an obvious fu- in English and Spanish. The Journal of the AmericanMedical Association, Volume 285, Number 20, pages ture step. Other health-related areas are the most likely candidates because of the focus on quality of informa-tion in those areas.
[9] M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pellegb, M. Shtalhaima and S. Ura. The shark-search algorithm.
an application: tailored web site mapping. In Proceed- Acknowledgments
ing of the Seventh World Wide Web Conference, 1998.
We gratefully acknowledge the contribution of Kathy [10] Griffiths K and Christensen H. The quality and acces- Griffiths and Helen Christensen in providing expert in- sibility of australian depression sites on the world wide put about the depression domain and about BluePages, web. The Medical Journal of Australia, Volume 176, and of John Lloyd and Eric McCreath for their advice [11] A. McCallum, K. Nigam, J. Rennie and K. Seymore.
Building domain-specific search engines with machine References
Symposium on Intelligents Engine in Cyberspace, 1999.
[1] C. C. Aggarwal, F. Al-Garawi and P. S. Yu.
[12] F. Menczer, G. Pant and P. Srinivasan. Evaluating topic- the design of a learning crawler for topical resource driven web crawlers. In Proceeding of the 24th Annual discovery. ACM Trans. Inf. Syst., Volume 19, Number 3, Intl. ACM SIGIR Conf. On Research and Development in Information Retrieval, 2001.
[2] P. De Bra, G. Houben, Y. Kornatzky and R. Post.
[13] C. J. L. Murray and A. D. Lopez (editors).
Information retrieval in distributed hypertexts. In Pro- Global Burden of Disease and Injury Series. Harvard ceedings of the 4th RIAO Conference, pages 481–491, University Press, Cambridge MA, 1996.
[14] G. Salton and C. Buckley. Term weighting approaches [3] S. Chakrabarti, M. Berg and B. Dom. Focused crawling: in automatic text retrieval. Technical report, 1987.
A new approach to topic-specific web resource discov- [15] T.T. Tang, N. Craswell, D. Hawking, K. M. Griffiths ery. In Proceeding of the 8th International World Wide and H. Christensen. Quality and relevance of domain- specific search: A case study in mental health.
[4] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, appear in the Journal of Information Retrieval - Special D. Gibson and J. Kleinberg. Automatic resource com- pilation by analyzing hyperlink structure and associated [16] I. H. Witten and E. Frank. Data Mining: Practical ma- text. In Proceedings of the seventh international con- chine learning tools with Java implementations. Morgan ference on World Wide Web 7, pages 65–74. Elsevier

Source: http://people.eng.unimelb.edu.au/ammoffat/adcs2004/papers/paper01.pdf

C:\rapa2005supl\sp\resspbis.pdf

SP 34 RESULTADOS PRODUCTIVOS Y ECONÓMICOS POR APLICACIÓN DEL MÉTODO LLOVERA EN GALPONES DE GALLINAS PONEDORAS. Galvagni, A., Drab, S., Antruejo, A., Cappelletti, G., Fain Binda, J.C., Martínez, M., Muñoz, G. y Panno, A. Fac.Cs. Veterinarias, - Univ.Nac. de Rosario, Santa Fe. sdrab@fveter.unr.edu.ar Production and economic results of implementing llovera method in laying- hen houses La

curriculum vitae

MARIANNE B. MÜLLER, M.D. CURRICULUM VITAE Max Planck Institute of Psychiatry Kraepelinstraße 10 80804 Munich Germany Phone: +49-89-30622-288 Fax: Current Position: Head, Molecular Stress Physiology Group, MPI of Psychiatry EDUCATION 1987-1989 Student at the Rheinische Hochschule für Musik Köln (Cologne University of Music), instrumental music performance, piano 198