Evaluating C++ Design Pattern Miner Tools
Lajos F¨ul¨op, Tam´as Gyovai and Rudolf Ferenc
University of Szeged, Department of Software Engineering
{flajos|ferenc}@inf.u-szeged.hu
very important that these definitions were informal, becausethis way a pattern can be used in a wider context. Due to
Many articles and tools have been proposed over the
the imprecise pattern definitions the implemented structures
years for mining design patterns from source code. These
based on them could vary in different contexts. Design pat-
tools differ in several aspects, thus their fair comparison is
terns also differ in the aspect that they are used intentionally
hard. Besides the basic methodology, the main differences
or only in a casual way. A programmer can apply a design
are that the tools operate on different representations of the
pattern, without actually knowing about it. subject system and that the pattern definitions differ as well.
Design patterns are employed in many areas of software
In this paper we first provide a common measurement
development. Originally, their main aim was to develop bet-
platform for three well-known pattern mining systems,
ter software systems by using good solutions. Another good
Columbus, Maisa and CrocoPat. Then we compare these
property of design patterns is, that their documentation in a
tools on four C++ open-source systems: DC++, WinMerge,
software system can simplify maintenance and program un-
Jikes and Mozilla. Columbus can discover patterns from the
derstanding. This is especially true for large software sys-
C++ source code itself, while Maisa and CrocoPat require
tems. Unfortunately, the developers usually do not provide
the representation of a software system in a special textual
this documentation, so there is a big need to discover design
format, so we extended Columbus to provide the common
patterns from source code. Therefore, in the past years var-
ious design pattern miner tools have been developed. These
We compared these tools in terms of speed, memory con-sumption and the differences between the hits. While the
One of the important aspects is the programming lan-
first two aspects showed comparable results, the recogni-
guage. There are tools for discovering patterns from Java
tion capabilities were quite diverse. This is probably due to
source code like Ptidej [13, 21] and Fujaba [11], and tools
the significant difference in how the patterns to be recog-
also exist for mining patterns from C++ code like Colum-
nized and formalized by the tools. Therefore we conclude
bus [2, 9]. Certain tools like CrocoPat [3] and Maisa [20]
that a more precise and formal description of design pat-
work based on special own textual format which describe
facts about the source code needed to find design patterns. These tools are general, because they can work on any pro-gramming language, only the appropriate textual input has
to be prepared from the source code.
Another aspect is the method used to discover design pat-
Design pattern mining, Tool evaluation, Columbus,
terns, which can be quite diverse. Columbus uses graph
matching, while Maisa solves constraint satisfaction prob-lems (CSP). CrocoPat has a new method to find structures
in large graphs, it makes an effective representation of rela-tions in graphs. Other special and interesting methods also
Design patterns are well-known structures in the soft-
exist, like pattern inference, which was presented by Tonella
and well-tried solutions for common recurring problems.
Our aim in this article was to compare three design pat-
Gamma et. al. [12] collected several object oriented design
tern miner tools: Columbus, Maisa and CrocoPat. We chose
patterns, and they gave an informal definition for them. It is
these tools, because it was possible to prepare a common
input for them with our front end, Columbus. Our previous
pressive thus it is able to describe design patterns, design
work enabled us to provide the input for Maisa [10], while
metrics or other structures. In section 3.2 we will present
in case of CrocoPat we created a new plug-in for Columbus
which is able to prepare the appropriate input. Finally, the
Arcelli et. al. [1] proposed three categories for design
tools have been compared in three aspects: differences be-
pattern mining tools, considering the information which
tween the hits, speed and memory requirements. We think
was used during the detection process. These categories are
that these are the most important aspects in a design pattern
the “entire” representation of design patterns, the minimal
miner tool. We did not analyze if a found design pattern hit
set of key structures that a design pattern consists of, and the
is true or false, we examined these tools only with the con-
sub-components of design patterns. They have dealt with
sideration of structural hits and differences in this aspect.
the last category, and two tools concerning this, FUJABA
We will proceed as follows. In Section 2, we will discuss
and SPQR were introduced and compared. The base of the
some works similar to ours. In Section 3 we will introduce
comparison was how a tool decomposes a design pattern
the design pattern miner tools, which were compared. Sec-
into smaller pieces. The conclusion was, that the decompo-
tion 4 describes our comparison approach, and our results
sition methods of the two examined systems are very simi-
are presented in Section 5. Finally, in Section 6 we will
lar, and finally they argued the benefits of sub-patterns.
present some conclusions and outline directions for future
Gu´eh´eneuc et. al. [14] introduced a comparative frame-
work for design recovery tools. The purpose of the authors’framework was not to rank the tools but to compare them
with qualitative aspects. This framework contained eightaspects: Context, Intent, Users, Input, Technique, Output,
In this section we show some similar works to ours, and
Implementation and Tool. These aspects were sorted into 53
we also present new and interesting methods in the area of
criteria which were demonstrated on two systems, Ptidej
and LiCoR. The major need for this framework is that, al-
In our previous work we have presented a method to dif-
though there are a lot of design recovery tools, the compari-
ferentiate true and false hits [8]. We employed machine
son between them is very hard due to the fact that they have
learning methods to filter out false design pattern hits. First,
very different characteristics in terms of representation, out-
we ran our design pattern miner tool that discovers patterns
put format and implementation techniques. This framework
based on structural descriptions. Afterwards, we classified
provides an opportunity for comparing not only similar sys-
these hits as being true or false, and finally we calculated
tems, but also systems, which are different. We note that
predictive information for the hits. We trained a decision
this comparison differs from ours because we compare sys-
tree based on classified values and on the predictive infor-
tems in a practical way. Namely, we want to show and com-
mation, from which we were able to mine true design pat-
pare how many patterns a tool can find, and how much time
and memory it needs for searching, while their comparison
Design pattern detection was also accomplished by the
is rather theoretical, it does not compare the discovering ef-
integration of two existing tools – Columbus [9] and
Maisa [20] – in our previous work [10]. This method com-
Kaczor et. al. [16] proposed a bit-vector algorithm for
bined the extraction capabilities of the Columbus reverse
design pattern identification. The algorithm initialization
engineering system with the pattern mining ability of Maisa.
step converts the design pattern motif and the analyzed pro-
First, the C++ code was analyzed by Columbus. Then the
gram model into strings. To model the design patterns
facts collected were exported to a clause-based design nota-
and the analyzed program, six possible relations can be
tion understandable for Maisa. Afterwards, this file was an-
used between elements: association, aggregation, compo-
alyzed by Maisa, and instances were searched that matched
sition, instantiation, inheritance and dummy. The authors
the previously given design pattern descriptions. Maisa ap-
gave an efficient Iterative Bit-vector Algorithm to match
proached the recognition problem as a constraint satisfac-
the string representation of the design patterns and the
tion problem. We will get back to this method in Sec-
analyzed program. They compared their implementation
with explanation-based constraint programming and metric-
Beyer et. al. [3, 4] have developed a system that is able to
enhanced constraint programming approaches.
work with large graphs effectively. The effectiveness of the
Costagliola et. al. [6] based their approach on a visual
system is based on binary decision diagrams which repre-
language parsing technique. The design pattern recogni-
sent the relations compactly. They have developed the rela-
tion was reduced to recognizing sub-sentences in a class
tion manipulation language (RML) for manipulating n-ary
diagram, where each sub-sentence corresponds to a design
relations and a tool implementation (CrocoPat) that is an
pattern specified by an XPG grammar. Their process con-
interpreter for the language. The RML language is very ex-
sist of two phases: the input source code is translated into a
class diagram represented in SVG format; then DPRE (De-
to Prolog. Figure 2 shows the description of the Factory
sign Pattern Recovery Environment) recovers design pat-
terns using an efficient LR-based parsing approach.
Tonella and Antoniol [22] presented an interesting ap-
proach to recognize design patterns. They did not use a
class(”Product”). class(”ConcreteProduct”).
library of design patterns as others did but, instead, discov-
extends(”ConcreteProduct”,”Product”).
ered recurrent patterns directly from the source code. They
!same(Product,ConcreteProduct). class(Creator).
employed concept analysis to recognize groups of classes
method(”Creator.FactoryMethod()”).
sharing common relations. The reason for adapting this ap-
has(”Creator”,”Creator.FactoryMethod()”). returns(”Creator.FactoryMethod()”,”Product”).
proach was that a design pattern could be considered as a
formal concept. They used inductive context construction
extends(”ConcreteCreator”,”Creator”). method(”ConcreteCreator.FactoryMethod()”).
which then helped them to find the best concept.
has(”ConcreteCreator”,”ConcreteCreator.FactoryMethod()”). creates(”ConcreteCreator.FactoryMethod()”,”ConcreteProduct”). implements(”ConcreteCreator.FactoryMethod()”,”Creator.FactoryMethod()”).
returns(”ConcreteCreator.FactoryMethod()”,”Product”). !same(Creator,ConcreteCreator). !same(Product,Creator).
In this study we compared the design pattern mining
capabilities of three tools, namely Maisa, CrocoPat and
Columbus. Maisa and CrocoPat cannot analyze sourcecode, so we extended our Columbus framework (whoseoriginal task was to analyze C++ source code and build
Figure 2. Factory Method pattern in Maisa
an ASG – Abstract Semantic Graph representation fromit) to produce input files for the two tools. So, our studywas based on the same input facts, this way ensuring afair-minded comparison, because eventual parsing errors af-
fected all tools in the same way. We illustrate this processin Figure 1.
In the next sections we will introduce the compared
In Section 2 we already introduced the CrocoPat tool [3]
design pattern miner tools. We will show every tools’
briefly, and now we will describe it in detail. First we will
design pattern description language on the well-known
show how the CrocoPat interpreter works, and then we will
introduce the relational manipulation language. Finally wewill also mention the binary decision diagrams (BDD).
Previously, we mentioned that CrocoPat is an interpreter,
and it executes RML programs. First, CrocoPat reads the
Maisa is a software tool [20] for the analysis of software
graph representation in rigi standard file format (RSF) [19]
architectures developed in a research project at the Univer-
from the standard input. Afterwards, the RML description
sity of Helsinki. The key idea in Maisa is to analyze design
is processed and a BDD representation is created from it.
level UML diagrams and compute architectural metrics for
Finally, the RML program is executed and an RSF output is
early quality prediction of a software system.
In addition to calculating traditional (object-oriented)
software metrics such as the Number of Public Methods,
The RML (Relational Manipulation Language) is very
Maisa looks for instances of design patterns (either generic
similar to logic programming languages like Prolog, but it
ones such as the well-known GoF patterns or user-defined
contains techniques of imperative programming languages
special ones) from the UML diagrams representing the soft-
too. Hence, it is very expressive and it can describe design
ware architecture. Maisa also incorporates metrics from dif-
patterns among other structures. Unfortunately, we have not
ferent types of UML diagrams and execution time estima-
found any design pattern library in RML, so we had to cre-
tion through extended activity diagrams.
ate the descriptions of the patterns by ourselves. Figure 3
Maisa uses constraint satisfaction [17], which is a
shows the Factory Method description in CrocoPat.
generic technique that can be applied to a wide variety of
To sum up, the main goals of Beyer et. al. [4] was effi-
tasks, in this case to mining patterns from software archi-
ciency and easy integration with other tools when they de-
tectures or software code. A constraint satisfaction problem
veloped CrocoPat. Integration was facilitated by the im-
(CSP) is given as a set of variables and a set of constraints
port and export of relations in the simple Rigi Standard For-
restricting the values that can be assigned to those variables.
mat (RSF), and efficiency was achieved by representing the
Maisa’s design pattern description language is very similar
relations as binary decision diagrams [5]. Figure 1. Common framework
AbstractClass(X) := CLASS(X) & ABSTRACT(X);
(plug-ins) of the system. Some of these plug-ins are pro-
vided as basic parts of Columbus, while the system can be
ConcreteProduct(Cpr,Pr) := CLASS(Cpr) & Product(Pr) &
extended to meet other reverse engineering requirements aswell. This way we have got a versatile and easily extendible
Creator(Cr,Pr) := AbstractClass(Cr) & ASSOCIATION(Cr,Pr) & Product(Pr) &
CreatMethods(Cr,Pr,M) := Creator(Cr,Pr) & HASMETHOD(Cr,M);CreatorFM(Cr,Pr,FM) := CreatMethods(Cr,Pr,FM) & VIRTUAL(FM) &
One of the plug-ins is CAN2Dpm, which discovers
design patterns. The design patterns were described in
CreatorAM(Cr,Pr,AM,FM) := CreatMethods(Cr,Pr,AM) &
DPML (Design Pattern Markup Language) files, whichstore information about the structures of the design patterns.
ConcreteCreator(Ccr,Pr,Cr,Cpr) := CLASS(Ccr) & ASSOCIATION(Ccr,Pr) &
Product(Pr) & Creator(Cr,Pr) & TC(INHERITANCE(Ccr,Cr)) &
CAN2Dpm recognizes design patterns in the following way.
First, Columbus analyzes the source code and builds an Ab-
CCreatorFM(Ccr,Pr,Cpr,M) := ConcreteCreator(Ccr,Pr, ,Cpr) &
HASMETHOD(Ccr,M) & VIRTUAL(M) & !PUREVIRTUAL(M) &
stract Semantic Graph (ASG) that contains all the informa-
tion about the source code. Then CAN2Dpm loads a DPML
FactoryMethod(Prod,Creat,CProd,CCreat,CreatFM,CreatAM,CcreatFM) :=
file which also basically describes a graph. Afterwards it
tries to match this graph to the ASG using our algorithm
Creator(Creat,Prod) &ConcreteProduct(CProd,Prod) &
described in previous work [2]. Figure 4 shows the Factory
ConcreteCreator(CCreat,Prod,Creat,CProd) &
CreatorFM(Creat,Prod,CreatFM) &CreatorAM(Creat,Prod,CreatAM,CreatFM) &
The other two plug-ins of Columbus shown in Figure 1,
CCreatorFM(CCreat,Prod,CProd,CcreatFM) &
CAN2Maisa and CAN2CrocoPat, are responsible for creat-
ing the input files for Maisa and CrocoPat, respectively. Figure 3. Factory Method pattern in CrocoPat
In this section we will present the comparison approach
of the investigated design pattern mining tools concerning:
• The found design pattern instances. Differences are
Columbus is a reverse engineering framework, which has
caused by several reasons. The different tools use dif-
been developed in cooperation between FrontEndART Ltd.,
ferent techniques to define and describe what is a de-
the University of Szeged and the Software Technology Lab-
sign pattern. The recognition algorithms are also dif-
oratory of Nokia Research Center. Columbus is able to an-
ferent. The comparison of the found design pattern
alyze large C/C++ projects and to extract facts from them.
instances is just one of the several kinds of evaluation
The main motivation to develop the Columbus system has
that should be considered, therefore we measure other
been to create a general framework to combine a number of
important characteristics like speed and memory.
reverse engineering tasks and to provide a common inter-face for them. Thus, Columbus is a framework tool which
• Speed. Speed is measured by the amount of the time
supports project handling, data extraction, data representa-
taken by the tool to perform the selected design pattern
tion, data storage, filtering and visualization. All these ba-
mining on the selected C++ project. The differences
sic tasks of the reverse engineering process for the specific
in the time of the measuring process in the examined
needs are accomplished by using the appropriate modules
systems are described in Section 5.2. <?xml version=’1.0’?>
cation domain. These four real-life, freely available C++
<!DOCTYPE DesignPattern SYSTEM ’dpml-1.6.dtd’><DesignPattern name=’Factory Method’><Class id=’id10’ name=’Creator’ isAbstract=’true’>• DC++ 0.687. Open-source client for the Direct Con-
<Association ref=’id30’ /><Operation id=’id11’ name=’FactoryMethod’ kind=’normal’
nect protocol that allows to share files over the internet
isVirtual=’true’ isPureVirtual=’true’><hasTypeRep ref=’id50’/></Operation><Operation id=’id12’ name=’AnOperation’ kind=’normal’>• WinMerge 2.4.6. Open-source visual text file differen-
<calls ref=’id11’/>
tiating and merging tool for Win32 platforms [23]. <hasTypeRep ref=’id54’/>• Jikes 1.22-1. Compiler that translates Java source files
as defined in The Java Language Specification into the
<Class id=’id20’ name=’ConcreteCreator’><Base ref=’id10’ />
byte-coded instruction set and binary format defined in
<Association ref=’id30’ />
The Java Virtual Machine Specification [15]. <Operation id=’id21’ name=’FactoryMethod’ kind=’normal’
isVirtual=’true’ isPureVirtual=’false’><creates ref=’id40’ />• Mozilla 1.7.12. All-in-one open source Internet appli-
<hasTypeRep ref=’id50’/>
cation suite [18]. We used a checkout dated March 12,
<Class id=’id30’ name=’Product’ isAbstract=’true’></Class>
Table 1 presents some information about the analyzedprojects. The first row shows how many source and header
<Class id=’id40’ name=’ConcreteProduct’ isChangeable=’true’ ><Base ref=’id30’ />
files were analyzed in the evaluated software systems. The
second row lists the size of these source and header files in
<TypeRep id=’id1’/>
The last two rows were calculated by the metric calcu-
<TypeRep id=’id50’>
lator plug-in of Columbus, and gives information about the
total lines of code (LOC) and the number of classes. Under
<hasReturnTypeRep ref=’id52’/>
the term of LOC we mean every line in source code that is
not empty and is not a comment line (also known as “logical
<TypeRep id=’id52’><TypeFormerType ref=’id30’/><TypeRep id=’id54’><hasReturnTypeRep ref=’id56’/><TypeRep id=’id56’>Table 1. Size information of the projects <TypeFormerType ref=’id1’/>
All tests are run on the same computer so the measured
values are independent from the hardware and thus the re-
Figure 4. Factory Method pattern in DPML
sults are comparable. Our test computer had a 3 GHz IntelXeon processor with 3 GB memory. In the next chapter wewill describe our benchmark results and evaluate them in
• Memory usage. Memory usage is another performance
measure. We measured the total memory required forthe design pattern mining task. It was complicated be-
cause the memory usage of CrocoPat is fixed, and onlythe Columbus source code was available to extend it toprovide us with memory usage statistics. The applied
In this section we will present our results concerning
memory measuring method for the examined tools are
the differences between the design pattern instances found,
the running-time and the memory requirements. In thenext subsection we will start with the discovered pattern in-
We did the comparison on four open source small-to-
stances, and then compare the time efforts of the tools. Fi-
huge systems, to make the benchmark results independent
nally, we will show the memory requirements of the design
from system characteristics like size, complexity and appli-
Table 3. WinMerge hits Table 2. DC++ hits
Maisa did not have this precondition. In the second case thefound pattern in Maisa had a Target participant that was not
In this section we will present our experiments regard-
abstract, which was a requirement in Columbus. If we re-
ing pattern instances found by the compared design pattern
laxed the description of this pattern in Columbus, it found
miner tools. Unfortunately, Maisa did not contain descrip-
these two instances too. The best solution would be if an ex-
tions of the patterns Bridge, Chain of Responsibility, Deco-
act definition existed for this pattern in both tools. CrocoPat
rator, State, Strategy and Template Method (the results for
found six Adapter Object instances, while Columbus found
these are marked with dashes in our tables). We have in-
only three. The cause was that if Columbus found a pattern
vestigated the differences between the tools manually, so
instance with certain participant classes and another pattern
we checked and compared the found instances and the de-
instance existed with the same participant classes but par-
scriptions of design patterns in all of the cases. We will not
ticipating with different methods, Columbus considered it
explain every difference, because there is not enough space
as being the same pattern. This is a very important differ-
for it, but we will present the most common causes.
ence between Columbus and CrocoPat, so we will refer to
First, we summarize our results on DC++ in Table 2.
this difference several times. Maisa found a Builder in Win-
This was a small software system, so it did not contain too
Merge but the two other tools did not, because in Maisa the
many design pattern instances. Maisa found two Adapter
Builder pattern representation did not contain the Director
Classes, while CrocoPat and Columbus found none. This
participant while the two other tools did contain it. In the
is due to the fact that the definition of the Adapter Class
case of State, Strategy and Template Method the differences
in Maisa differed from those in Columbus and CrocoPat.
were due to that Columbus counted pattern instances partic-
In Maisa the Target participant class was not abstract and
ipating with different methods only once, like in the case of
the Request method of the Target class was not pure vir-
tual, while in Columbus and CrocoPat these features were
Next, we will describe our experiments on design pat-
requested. We have examined the two Adapter Class hits
tern instances found in Jikes (see Table 4). Maisa found an
in Maisa, and we have found that the Targets were not ab-
Adapter Class, while Columbus and CrocoPat did not. The
stract in these cases and the Request operations were not
reason was the same as in the case of DC++, namely that in
pure virtual. Columbus and CrocoPat found 14 State and 14
Maisa the Target participant class was not abstract and the
Strategy design pattern instances. The cause of the identical
Request method of the Target class was not pure virtual but
number of hits is that the State and Strategy patterns share
in Columbus and CrocoPat these features were required. In
the same static structure, so their description in the tools
the case of Adapter Object Maisa missed a lot of hits, while
Columbus and CrocoPat could discover a lot of design pat-
Table 3 shows the results of the tools in the case of Win-
tern instances. It looked like CrocoPat found more instances
Merge. Maisa found two more Adapter Objects in Win-
because Columbus counted repeating patterns with different
Merge than Columbus. In the first case the difference was
operations only once. Actually, these tools found the same
caused by the fact that the Request method of a participant
pattern instances. In Maisa the Builder pattern representa-
Adapter Object class was defined virtual in Columbus while
tion did not contain the Director participant, so Maisa found
lot of design pattern instances were found, like in the case
of State, where CrocoPat found 7662 and Columbus discov-
ered 722 instances. This huge difference was due to the fact
that the found design pattern instances were not grouped by
CrocoPat, that is, if a design pattern contained a class with
child classes where the child classes could be of arbitrary
number, every repeated child class with the common par-
ent appeared as a new hit. Columbus recognized this situa-tion and handled it correctly. In the case of Adapter Class
the causes of differences were the same as in Jikes and in
DC++ examined earlier. Columbus did not count repeated
instances in the case of Adapter Object, so it actually found
the same instances as CrocoPat, but Maisa missed some
because of its different pattern description. CrocoPat andColumbus found the same instances of the Bridge patternbut Columbus counted the repeating patterns with different
Table 4. Jikes hits
operations only once. Maisa found false Builder instancesagain, because the description of this pattern did not contain
the Director participant class. Maisa found Factory Meth-
ods instances while the two other tools did not. This is due
to that the two other tools defined Factory Method with an
abstract Product and an abstract Creator participant class,
while Maisa did not require these participants to be abstract.
CrocoPat did not find any Mediator instance in Mozilla,
while Maisa discovered two instances. This is due to that
Maisa described Mediator in a very special way, so it con-
tained a Mediator with two Colleagues, but Concrete Me-
diators were missing. In the case of Prototype, Singleton,
State, Strategy and Template Method the differences were
caused again by that CrocoPat counted every repeated pat-
tern instance while Columbus counted these repeated ones
with different operations only once.
Because of space limitation we cannot explain every dif-
ference, but we have shown the common reasons. Basically,
Table 5. Mozilla hits
the found design pattern instances would be the same inmost of the cases if we could disregard the following com-mon causes of differences:
an incomplete Builder instance in Jikes. CrocoPat did notfind any Mediator in Jikes, while Maisa found four. It is due
• Different definitions of design patterns. We have found
to that Maisa described Mediator in a very special way, so
that there were some specific reasons for that the tools
that it contained a Mediator with two Colleagues, but Con-
discovered different pattern instances. The main rea-
crete Mediators were missed. The description of Mediator
son was in some cases that a design pattern description
in CrocoPat required a Mediator abstract class with a child
missed a participant like in the case of the Builder pat-
ConcreteMediator class, too. In the case of Proxy, every
tern in Maisa. In this case the pattern definition did not
tool discovered 53 instances, but CrocoPat counted also re-
contain the director participant, thus the instances dis-
peating patterns with different methods. Maisa found 21
covered by Maisa differed from the results of the other
instances more because it did not require an abstract Proxy
tools. For example, the results of Maisa in WinMerge
participant class in the Proxy design pattern. In the case of
for the Builder pattern differed from those of CrocoPat
State and Strategy it seemed that Columbus found less de-
sign pattern instances but it counted every repeated patternwith different methods only once. Maisa found 23 Visitor
• Precision of pattern descriptions. Another difference
patterns, that the two other tools did not. This is due to the
was how precise and strict the pattern descriptions
loose description of this pattern in Maisa.
were. For example, in the case of Jikes the differences
Table 5 shows our experiments in the case of Mozilla. A
in the numbers of found Adapter Class instances were
caused by the fact that CrocoPat and Columbus defined
the Target as abstract while Maisa did not. • Differences in algorithms. We have perceived dif-
ferences in the design pattern miner algorithms, too.
Columbus and Maisa counted the repeated instances
with different operations only once while CrocoPat
In this section we will present and compare the speed
performance of the three assessed design pattern miner
tools. We wanted to measure only the search time for pat-
terns, therefore we divided the running time into two parts,an initialization part and a pattern mining part. Tables 6, 7, 8and 9 contain the values of the pattern mining time only. Table 6. DC++ times
Table 10 contains the initialization time of the tools (timeformat: hh:mm:ss).
The design pattern mining time was measured in the fol-
• Columbus. We took into account only the graph match-
ing time, so we did not consider the time while the
ASG was loaded. The graph loading time is presented
• CrocoPat. In the case of CrocoPat, we have prepared
a small tool which executed CrocoPat and measured
its running time. We measured the time needed for
every pattern mining procedure for every subject soft-
ware system. Next, we also measured the time for the
subject systems with an empty RML program, because
this way we could measure the time necessary to re-serve the memory and to prepare the BDD represen-
Table 7. WinMerge times
tation (initialization time). These results are shown inTable 10. Finally, we subtracted the initialization timefrom the full running time for every result, and this
The time requirements for discovering patterns in Win-
way obtained the pattern matching times.
Merge (see Table 7) and Jikes (see Table 8) were differ-
• Maisa. Maisa created statistics for every pattern min-
ent. Columbus was very fast in the case of larger patterns,
ing procedure, which contained information about the
because it could filter out [2] a lot of class candidates at
time necessary for pattern mining, so we used these
the beginning of the discovering process. Opposite to this,
generated statistics. Contrary to CrocoPat, there was
Columbus was slower in the case of smaller patterns, be-
no need to extract the initialization time, because time
cause in these cases a lot of class candidates remained for
values in the generated statistics measured only the
the detailed discovering process. CrocoPat’s and Maisa’s
pattern mining phase. However, we also show the ini-
time requirements were very balanced.
tialization time for Maisa in Table 10.
Finally, Table 9 shows the results for Mozilla. In most
cases, CrocoPat delivered the best results, but in certain
First we show our results for DC++ (see Table 6). In
cases Columbus and Maisa were faster. Columbus was slow
this case the required time was very small for every as-
when it could filter out only a small amount of class candi-
sessed pattern miner tool, therefore they can be considered
dates at the beginning of the discovering process. The CSP
as being equal. This is due to the small size of the DC++
algorithm of Maisa was also slow in this case.
system, hence the design pattern instances were discovered
Our conclusion was that the best tool regarding speed in
very quickly in this system by all three tools.
general is CrocoPat, but in some cases Columbus was faster. Table 10. Initialization times • CrocoPat. CrocoPat’s memory usage is constant and
can be set as a command line parameter. Therefore, we
created a script that executed CrocoPat iteratively from
1 megabyte reserved memory up to 200 megabytes for
every pattern mining process. We took the smallest
possible value so that the pattern mining process stillcompleted successfully. Table 8. Jikes times
Our experiment proved that the memory usage strongly
depended on the size of the analyzed projects and it was in-
dependent from the searched design patterns. This was true
for every pattern miner tool as it can be seen in Table 11. Table 11. Memory requirements in megabytes
In the case of Columbus the reserved memory was very
large compared to the other tools. This is due to the fact
Table 9. Mozilla times
that Columbus is a general reverse engineering frameworkand design pattern detection is only one of its many fea-
tures. For this reason it uses an ASG representation, whichcontains all information about the source code (includingdetailed facts about statements and expressions not needed
In this section we will introduce and compare the mem-
for design pattern detection) for all kinds of tasks. Right
ory usage of the three compared design pattern miner tools.
now, for technical reasons, the design pattern miner plug-in
We have measured the memory requirements of every de-
of Columbus does not work without the ASG (although it
sign pattern mining procedure, but we show our results sum-
does not have to use it), but we wish to fix this in the future.
marized in one table because we have found them very sim-
Therefore, we measured the memory needed by Columbus
also without the ASG and showed these numbers in paren-
The memory measurement method in the examined sys-
tems was accomplished in the following way:
Note, in the case of CrocoPat and Maisa the reserved
• Columbus. We have extended the tool, so that it reports
memory was smaller because their input contained only the
information about the source code necessary for pattern de-tection. • Maisa. Maisa did not report the memory usage in its
After examining Table 11 we can conclude that in the as-
statistics, so we measured it by simply monitoring its
pect of memory requirement Maisa’s performance was the
peak memory usage on the task manager.
[6] G. Costagliola, A. D. Lucia, V. Deufemia, C. Gravino, and
M. Risi. Design Pattern Recovery by Visual Language Pars-
In this paper we have presented a comparison of three
ing. In Proceedings of the 9th Conference on Software
design pattern miner tools: Columbus, Maisa and CrocoPat. Maintenance and Reengineering (CSMR’05), pages 102–
We have compared them regarding patterns hits, speed and
111. IEEE Computer Society, Mar. 2005.
memory consumption. We have guaranteed the common in-
put for the tools by analyzing the source code with the front
http://sourceforge.net/projects/dcplusplus/
end of Columbus and by creating plug-ins for producing the
A. Besz´edes, L. F¨ul¨op, and J. Lelle. Design Pat-
required files for the tools. This way, as a “side effect” of
tern Mining Enhanced by Machine Learning. In Proceed-
this work, we have extended our Columbus Reverse Engi-
ings of the 21th International Conference on Software Main-tenance (ICSM 2005), pages 295–304. IEEE Computer So-
neering Framework with plug-ins for Maisa and CrocoPat.
We conclude that the fastest tool is CrocoPat, and Maisa re-
A. Besz´edes, M. Tarkiainen, and T. Gyim´othy.
quires the least memory, while Columbus is an all-in-one
Columbus – Reverse Engineering Tool and Schema for C++.
solution for design pattern detection from C++ source code
In Proceedings of the 18th International Conference on
with comparable performance to the other two specialized
Software Maintenance (ICSM 2002), pages 172–181. IEEE
Originally, Gamma et. al. [12] defined the design pat-
[10] R. Ferenc, J. Gustafsson, L. M¨uller, and J. Paakki. Recog-
terns to develop object-oriented applications in forward en-
nizing Design Patterns in C++ programs with the integration
gineering. Therefore, pattern definitions were informal to
of Columbus and Maisa. Acta Cybernetica, 15:669–682,2002.
make them easier to use in different languages and con-
texts. Consequently, the design pattern miner tools have
http://www.cs.uni-paderborn.de/cs/fujaba/
a big problem in common, which is how to define a given
[12] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design
design pattern. In this paper we have shown that the tools
Patterns : Elements of Reusable Object-Oriented Software.
found different design pattern instances in common inputs
mostly because of their different pattern definitions. Hence,
[13] Y.-G. Gu´eh´eneuc and N. Jussien. Using explanations for de-
a formal description of design patterns is very desirable.
sign patterns identification. In Proceedings of IJCAI Work-
In the future we plan to create a design pattern catalog
shop on Modelling and Solving Problems with Constraints,pages 57–64, Aug. 2001.
for reverse engineers, where every design pattern will have
[14] Y.-G. Gu´eh´eneuc, K. Mens, and R. Wuyts. A Comparative
a strict and formal description. With this new collection
Framework for Design Recovery Tools. In Proceedings of
of design patterns the presented drawbacks in Section 5.1
the 10th Conference on Software Maintenance and Reengi-
(different definitions of design patterns, precision of pattern
neering(CSMR’06), pages 123–134. IEEE Computer Soci-
descriptions and differences in methods) can be avoided.
[15] IBM Jikes Project. http://jikes.sourceforge.net/
[16] O. Kaczor, Y.-G. Gu´eh´eneuc, and S. Hamel. Efficient Iden-
tification of Design Patterns with Bit-vector Algorithm. InConference on Software Maintenance and Reengineering
[1] F. Arcelli, S. Masiero, C. Raibulet, and F. Tisato. A Compar-
(CSMR’06), pages 175–184. IEEE Computer Society, 2006.
ison of Reverse Engineering Tools based on Design Pattern
[17] A. K. Mackworth. The logic of constraint satisfaction. Artif.
Decomposition. In Proceedings of the 15th Australian Soft-Intell., 58(1-3):3–20, 1992. ware Engineering Conference (ASWEC’05), pages 677–691.
[18] The Mozilla Homepage. http://www.mozilla.org/
[19] H. A. M¨uller, K. Wong, and S. R. Tilley. Understanding
[2] Z. Balanyi and R. Ferenc. Mining Design Patterns from C++
Software Systems Using Reverse Engineering Technology.
Source Code. In Proceedings of the 19th International Con-
In Proceedings of ACFAS, 1994. ference on Software Maintenance (ICSM 2003), pages 305–
[20] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen, and
314. IEEE Computer Society, Sept. 2003.
A. Verkamo. Software Metrics by Architectural Pattern
[3] D. Beyer and C. Lewerentz. CrocoPat: Efficient pattern
Mining. In Proceedings of the International Conference on
analysis in object-oriented programs. In Proceedings of theSoftware: Theory and Practice (16th IFIP World Computer11th IEEE International Workshop on Program Comprehen-Congress)., pages 325–332, 2000. sion (IWPC 2003), pages 294–295. IEEE Computer Society,
[4] D. Beyer, A. Noack, and C. Lewerentz. Efficient Rela-
[22] P. Tonella and G. Antoniol. Object oriented design pat-
tional Calculation for Software Analysis. In Transactions
tern inference. In Proceedings of the International Confer-on Software Engineering (TSE’05), pages 137–149. IEEE
ence on Software Maintenance (ICSM ’99), pages 230–238,
Washington, DC, USA, 1999. IEEE Computer Society.
[5] R. Bryant. Graph-Based Algorithms for Boolean Function
Manipulation. In Transactions on Computers, pages 677–
http://sourceforge.net/projects/winmerge/
691. IEEE Computer Society, Feb. 1986.
1 0 2 2 3 S a w m i l l P k w y P o w e l l O H 4 3 0 6 5 We have reached October, our tenth month of our 12 month program of fitness. Our goal this month is to buy a new pair of shoes. A tennis shoe lasts 300-500 miles or at least twice a year get them re-placed. We have had many patients be more fit and reports of 18-32 lbs. lost, by participating in one of the many forms of lifestyle
PATIENT’S NAME_________________________ AGE_______ DATE OF BIRTH___________ EXPLAIN BRIEFLY WHAT SYMPTOMS BRING YOU TO THIS OFFICE: ARE ANY OF YOUR PRESENT PROBLEMS DUE TO INJURY? Yes____, No___ ARE YOU RIGHT-HANDED [ ] OR LEFT-HANDED [ ]? PAST MEDICAL HISTORY: 1. HAVE YOU EVER HAD: (Check the appropriate boxes and list year to the right) 2. PLEASE LIST IN CHRONOLOGICAL ORDER ALL HOSP