programming language: 2008

Sep 25, 2008

large makefiles with variable

Our example makefile didn't use any variables. Let's include some, to see if it help us out:

CC = gcc
CFLAGS = -g -O2
OBJECTS = main.o foo.o

main.exe : $(OBJECTS)
$(CC) $(CFLAGS) $(OBJECTS) -o main.exe

main.o : main.c
$(CC) $(CFLAGS) -c main.c

foo.o : foo.c
$(CC) $(CFLAGS) -c foo.c

This makefile looks a lot like the old makefile, except that a lot of the commands have been replaced with variable substitutions. What make does is replace the variables with their variables in the target, dependency, and command sections of the rules. That lets you specify some things in one place to make it easier to maintain. In our example, we use $(CC) to specify the compiler, so we could set it to something else if we wanted to without having to change the whole makefile.

Here's another trick that GNU make can let you do. In the above makefile, we had to include the rule for compiling sources into objects twice - once for each source file. That could get tiresome when we have dozens of sources, so let's define a pattern instead. This pattern will be used whenever make needs to compile any source:

%.o : %.c
$(CC) $(CFLAGS) -c $<

Here, we have used the percent (%) character to denote that part of the target and dependency that matches whatever the pattern is used for, and the $< is a special variable (imaging it like $(<)) that means "whatever the depencies are". Another useful variable is $@, which means "the target". Our Makefile now looks like this:

CC = gcc
CFLAGS = -g -O2
OBJECTS = main.o foo.o

main.exe : $(OBJECTS)
$(CC) $(CFLAGS) $(OBJECTS) -o main.exe

%.o : %.c
$(CC) $(CFLAGS) -c $<

Now, if we need to add more source files, we only have to update the line that defines the OBJECTS variable!

Note that make is pretty smart about a lot of things already, like how to build object files. In our above example, we could have left out the last rule completely! The fact that we chose CC and CFLAGS as names was no coincidence, since those are the variables that make's built-in rules use. To see a list of all the built-in rules, consult the make documentation or run "make -p"

The reference manual for make (run "info make") contains many more examples and nifty tricks for building makefiles, but this section covered the bare minimum that you'll need to know to manage your projects with make.

Sep 24, 2008

uppercase any filenames with lowercase chars

#!/bin/sh
# uppercase any filenames with lowercase chars
for file in $*
do
if [ -f $file ]
then
ucfile=`echo $file | tr [:lower:] [:upper:]`
if [ $file != $ucfile ]
then
mv -i $file $ucfile
fi
fi
done

Jul 18, 2008

Search and replace all files in dir

Method I:

#!/bin/sh
for file in $(grep -il "Hello" *.txt)
do
sed -e "s/Hello/Goodbye/ig" $file > /tmp/tempfile.tmp
mv /tmp/tempfile.tmp $file
done

Method II:

find /path/to/start/from/ -type f | xargs perl -pi -e 's/applicationX/applicationY/g'

Method III:

#!/bin/sh
myname="/tmp/`whoami``date +%d%m%H%M%S`"
if test -f $myname
then echo "$0: Cannot make directory $myname (already exists)" 1&>2
exit 0
fi
mkdir "$myname"
for FILE in $@;
do sed 's/old_string/new_string/g' $FILE > "$myname"/"$FILE"new_tmp
mv "$myname"/"$FILE"new_tmp $FILE
done
rmdir $myname

Replace old_string with the string you want to replace and new_string with the replacement string.

Note: This script makes use of the /tmp directory.

The sed command uses the same syntax as Perl to search for and replace strings. Once you have created the script, enter the following at the Unix command line prompt: sh script_name file_pattern Replace script_name with the filename of the script, and file_pattern with the file or files you want to modify. You can specify the files that you want to modify by using a shell wildcard, such as *.html.

May 14, 2008

The process table and the nice command

The process table and the nice command

The kernel maintains a list of all the current processes in a "process table"; you can use the ps command to view the contents of this table.

Each process can also be assigned a priority, or "niceness" level; a value which ranges from -20 to 19. A priority of "-20" means that the process will be given access to the CPU more often, whereas a priority of "19" means that the process will only be given CPU time when the system is idle.

You can use the nice and renice commands to specify and alter these values for specific processes.
Process creation

From your shell prompt (bash), you will usually instruct the system to run an application for you; for example, "vi". This will then cause your "bash" process to "fork" off a new process. The initial process is referred to as the "parent process", and the process which it forked as the "child process".

The process table contains the parent PID (PPID), and uses this to track which processes spawned which other ones.
System Processes

As well as the standard user processes that you would expect to find running, such as your shell and perhaps your editor, there are also several system processes that you would expect to find running. Examples of these include the cron daemon (crond), which handles job scheduling, and the system log daemon (syslogd), which handles the logging of system messages.
Scheduling Command execution with batch (cron) jobs

There are two methods of scheduling jobs on a Unix system. One is called at, which is used for once-off batch jobs. The other is called cron, which is used for regularly run tasks.

The at jobs are serviced by the "at daemon (atd)".
at

SYNTAX:
at [-f script] TIME

This command is used to schedule batch jobs.

You can either give it a script to run with the "-f" parameter, or you can specify it after you've typed the command.

The "TIME" parameter can be in the form of HH:MM, or "now + n minutes". There are several other complicated methods of specifying the time, which you should look up in the man page for at(1).

debian:~# at now + 5 minutes
warning: commands will be executed using /bin/sh
at> echo hello!
at>
job 1 at 2004-03-12 13:27

We have now scheduled a job to run in 5 minutes time; that job will simply display (echo) the string "hello!" to stdout.

To tell at that you're finished typing commands to be executed, press Ctrl-d, that will display the marker that you can see above.
atq

SYNTAX:
atq

This command displays the current batch jobs that are queued:

debian:~# atq
1 2004-03-12 13:27 a root
debian:~#

This is the job that we queued earlier.

The first number is the "job id", followed by the date and time that the job will be executed, followed by the user who the job belongs to.
atrm

SYNTAX:
atrm

This command simply removes jobs from the queue.

debian:~# atrm 1
debian:~# atq
debian:~#

We've now removed our scheduled job from the queue, so it won't run.

Let's add another one, and see what happens when it is executed:

debian:~# at now + 1 minute
warning: commands will be executed using /bin/sh
at> touch /tmp/at.job.finished
at>
job 3 at 2004-03-12 13:27
debian:~# atq
3 2004-03-12 13:27 a root
debian:~# date
Fri Mar 12 13:26:57 SAST 2004
debian:~# date
Fri Mar 12 13:27:04 SAST 2004
debian:~# atq
debian:~# ls -l /tmp/at.job.finished
-rw-r--r-- 1 root root 0 Mar 12 13:27 /tmp/at.job.finished

As you can see, we scheduled a job to execute one minute from now, and then waited for a minute to pass. You'll notice how it was removed from the queue once it was executed.
cron

The cron jobs are serviced by the "cron daemon (crond)".
crontab

SYNTAX:
crontab [ -u user ] { -l | -r | -e }
crontab [ -u user ] filename

You can use the crontab command to edit, display and delete existing cron tables.

The "-u" switch lets the root user specify another user's crontab to perform the operation on.

Table 7.1. crontab options
l lists current crontab
r removes current crontab
e edits current crontab

If a filename is specified instead, that file is made the new crontab.

The syntax for a crontab is as follows:

# minute hour day month weekday command

Example:

# minute hour day month weekday command
0 1 * * * backup.sh

This cron job will execute the backup.sh script, at 01:00 every day of the year.

A more complicated example:

# minute hour day month weekday command
5 2 * * 5 backup-fri.sh

This cron job will execute the backup-fri.sh script, at 02:05 every Friday.

Weekdays are as follows:

01 - Monday
02 - Tuesday
etc.
07 - Sunday

[Note] Note

There is also a "system crontab", which differs slightly from the user crontabs explained above. You can find the system crontab in a file called /etc/crontab.

You can edit this file with vi, you must not use the crontab command to edit it.

You'll also notice that this file has an additional field, which specifies the username under which the job should run.

debian:~# cat /etc/crontab
# /etc/crontab: system-wide crontab
# Unlike any other crontab you don't have to run the `crontab'
# command to install the new version when you edit this file.
# This file also has a username field, that none of the other crontabs do.

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user command
25 6 * * * root test -e /usr/sbin/anacron ||
run-parts --report /etc/cron.daily
47 6 * * 7 root test -e /usr/sbin/anacron ||
run-parts --report /etc/cron.weekly
52 6 1 * * root test -e /usr/sbin/anacron ||
run-parts --report /etc/cron.monthly
#

Some of the daily system-wide jobs that run are:

1.

logrotate - this checks to see that the files in /var/log don't grow too large.
2.

find - this builds the locate database, used by the ?locate? command.
3.

man-db - this builds the "whatis" database, used by the whatis command.
4.

standard - this makes a backup of critical system files from the /etc directory, namely, your passwd,shadow and group files - that backups are given a .bak extension.

Monitoring system resources

The follow commands are vital for monitoring system resources:
ps and kill

The ps command displays the process table.

SYNTAX:
ps [auxwww]

a -- select all with a tty except session leaders
u -- select by effective user ID - shows username associated with each process
x -- select processes without controlling ttys (daemon or background processes)
w -- wide format

debian:~# ps
PID TTY TIME CMD
1013 pts/0 00:00:00 bash
1218 pts/0 00:00:00 ps

debian:~# ps auxwww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.4 0.1 1276 496 ? S 13:46 0:05 init
root 2 0.0 0.0 0 0 ? SW 13:46 0:00 [kflushd]
root 3 0.0 0.0 0 0 ? SW 13:46 0:00 [kupdate]
root 4 0.0 0.0 0 0 ? SW 13:46 0:00 [kswapd]
root 5 0.0 0.0 0 0 ? SW 13:46 0:00 [keventd]
root 140 0.0 0.2 1344 596 ? S 13:46 0:00 /sbin/syslogd
root 143 0.0 0.3 1652 836 ? S 13:46 0:00 /sbin/klogd
root 151 0.0 0.1 1292 508 ? S 13:46 0:00 /usr/sbin/inetd
daemon 180 0.0 0.2 1388 584 ? S 13:46 0:00 /usr/sbin/atd
root 183 0.0 0.2 1652 684 ? S 13:46 0:00 /usr/sbin/cron
root 682 0.0 0.4 2208 1256 tty1 S 13:48 0:00 -bash
root 1007 0.0 0.4 2784 1208 ? S 13:51 0:00 /usr/sbin/sshd
root 1011 0.0 0.6 5720 1780 ? S 13:52 0:00 /usr/sbin/sshd
root 1013 0.0 0.4 2208 1236 pts/0 S 13:52 0:00 -bash
root 1220 0.0 0.4 2944 1096 pts/0 R 14:06 0:00 ps auxwww

The USER column is the user to whom that particular process belongs; the PID is that processes unique Process ID. You can use this PID to send signals to a process using the kill command.

For example, you can signal the "sshd" process (PID = 1007) to quit, by sending it the terminate (TERM) signal:

debian:~# ps auxwww | grep 1007
root 1007 0.0 0.4 2784 1208 ? S 13:51 0:00 /usr/sbin/sshd
debian:~# kill -SIGTERM 1007
debian:~# ps auxwww | grep 1007

The "TERM" signal is the default that the kill command sends, so you can leave the signal parameter out usually.

If a process refuses to exit gracefully when you send it a KILL signal; e.g. "kill -SIGKILL ".
top

The top command will display a running process table of the top CPU processes:

14:15:34 up 29 min, 2 users, load average: 0.00, 0.00, 0.00
20 processes: 19 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 1.4% user, 0.9% system, 0.0% nice, 97.7% idle
Mem: 257664K total, 45104K used, 212560K free, 13748K buffers
Swap: 64224K total, 0K used, 64224K free, 21336K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1 root 0 0 496 496 428 S 0.0 0.1 0:05 init
2 root 0 0 0 0 0 SW 0.0 0.0 0:00 kflushd
3 root 0 0 0 0 0 SW 0.0 0.0 0:00 kupdate
4 root 0 0 0 0 0 SW 0.0 0.0 0:00 kswapd
[ ... ]

nice and renice

SYNTAX:
nice -

Example:

To run the sleep command with a niceness of "-10":

debian:~# nice --10 sleep 50

If you then run the top command in a different terminal, you should see that the sleep's NI column has been altered:

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
2708 root -9 -10 508 508 428 S <> [ -p ] [ -u ]

The "-p pid" parameter specifies the PID of a specific process, and the "-u user" parameter specifies a specific user, all of whose currently running processes will have their niceness value changed.
Example:

To renice all of user "student"'s processes to a value of "-10":

debian:~# renice -10 -u student

vmstat

The vmstat command gives you statistics on the virtual memory system.

SYNTAX:
vmstat [delay [count]]

debian:~# vmstat 1 5
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 212636 13748 21348 0 0 4 4 156 18 1 1 98
0 0 0 0 212636 13748 21348 0 0 0 0 104 12 0 0 100
0 0 0 0 212636 13748 21348 0 0 0 0 104 8 0 0 100
0 0 0 0 212636 13748 21348 0 0 0 0 104 10 0 0 100
0 0 0 0 212636 13748 21348 0 0 0 0 104 8 0 0 100
debian:~#

Field descriptions:

Table 7.2. procs
r processes waiting for run time
b processes in uninterpretable sleep
w processes swapped out but otherwise runnable

Table 7.3. memory
swpd virtual memory used (Kb)
free idle memory (Kb)
buff memory used as buffers (Kb)

Table 7.4. swap
si memory swapped in from disk (kB/s)
so memory swapped out to disk (kB/s)

Table 7.5. io
bi blocks sent to a block device (blocks/s)
bo blocks received from a block device (blocks/s)

Table 7.6. system
in interrupts per second (including the clock)
cs context switches per second

Table 7.7. cpu
us user time as a percentage of total CPU time
sy system time as a percentage of total CPU time
id idle time as a percentage of total CPU time
system monitoring tools:

It is often useful to be able to keep a historical record of system activity and resource usage. This is useful to spot possible problems before they occur (such as running out of disk space), as well as for future capacity planning.

Usually, these tools are built by using system commands (such as vmstat, ps and df), coupled together with rrdtool or mrtg, which store the data and generate graphs.

rrdtool: http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/

mrtg: http://people.ee.ethz.ch/~oetiker/webtools/mrtg/

Some of the more complex monitoring systems that have been built using these tools include the following:

Cacti: http://www.raxnet.net/products/cacti/

Zabbix:http://www.zabbix.com/

All of the above tools are open source and free to use.
ulimit:

You may use the bash shell built-in command "ulimit" to limit the system resources that your processes are allowed to consume.

The following is an excerpt from the bash man page:

SYNTAX:
ulimit [-SHacdflmnpstuv [limit]]

Provides control over the resources available to the shell and to
processes started by it, on systems that allow such control.

The -H and -S options specify that the hard or soft limit is set for
the given resource. A hard limit cannot be increased once it is set;
a soft limit may be increased up to the value of the hard limit. If
neither -H nor -S is specified, both the soft and hard limits are set.

The value of limit can be a number in the unit specified for the
resource or one of the special values hard, soft, or unlimited, which
stand for the current hard limit, the current soft limit, and no
limit, respectively. If limit is omitted, the current value of the
soft limit of the resource is printed, unless the -H option is given.
When more than one resource is specified, the limit name and unit are
printed before the value.

Other options are interpreted as follows:

-a All current limits are reported
-c The maximum size of core files created
-d The maximum size of a process's data segment
-f The maximum size of files created by the shell
-l The maximum size that may be locked into memory
-m The maximum resident set size
-n The maximum number of open file descriptors
(most systems do not allow this value to be set)
-p The pipe size in 512-byte blocks (this may not be set)
-s The maximum stack size
-t The maximum amount of cpu time in seconds
-u The maximum number of processes available to a single user
-v The maximum amount of virtual memory available to the shell

If limit is given, it is the new value of the specified resource (the
-a option is display only). If no option is given, then -f
is assumed. Values are in 1024-byte increments, except for -t, which
is in seconds, -p, which is in units of 512-byte blocks, and -n and
-u, which are unscaled values. The return status is 0 unless an
invalid option or argument is supplied, or an error occurs while
setting a new limit.

On a Debian system, the default ulimit settings should appear as follows:

debian:~# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 256
virtual memory (kbytes, -v) unlimited

The common use of this command is to prevent long running processes, such as web servers (e.g., Apache) and CGI scripts from leaking memory and consuming all available system resources. Using the ulimit command to reduce the locked memory and memory size options before starting up your web server would mitigate this problem.

Another common use is on shell servers where users may not "tidy up" after themselves; you can then set the cpu time limit in /etc/profile, thus having the system automatically terminate long running processes.

Another example specifically relates to core files. You'll notice that the default core file size is set to 0. This means that when an application crashes (called a "segmentation fault"), it does not leave behind a core file (a file containing the memory contents of the application at the time that it crashed). This core file can prove invaluable for debugging, but can obviously be quite large as it will be the same size as the amount of memory the application was consuming at the time! Hence the default "0" value.

However, to enable core dumps, you can specify the "-c" switch:

debian:~# ulimit -c
0
debian:~# ulimit -c 1024
debian:~# ulimit -c
1024

Any further applications launched from this shell, which crash, will now generate core dump files up to 1024 blocks in size.

The core files are normally named "core" or sometimes processname.core, and will be written to the current working directory of the specific application that crashed.
Working with log files

On a Linux system, you should find all the system log files are in the /var/log directory.

The first place you should look if you were experiencing problems with a running system is the system "messages" logfile.

You can use the tail command to see the last few entries:

$ tail /var/log/messages

It's sometimes useful to keep the log scrolling in a window as entries are added, and you can use tail's -f (follow) flag to achieve this:

$ tail -f /var/log/messages

Other files of interest in /var/log:

1.

auth.log -- log messages relating to system authentication
2.

daemon.log -- log message relating to running daemons on the system
3.

debug -- debug level messages
4.

syslog -- system level log messages
5.

kern.log -- kernel messages

The process which writes to these logfiles is called "syslogd", and its behavior is configured by /etc/syslog.conf.

Each log message has a facility (kern, mail, news, daemon) and a severity (debug, info, warn, err, crit). The syslog.conf file uses these to determine where to send the messages.

As machines continue to run over time, these files can obviously become quite large. Rather than the system administrator having to manually trim them, there is a utility called "logrotate".

This utility can be configured to rotate (backup and compress) old log files and make way for new ones. It can also be configured to only store a certain amount of log files, making sure that you keep your disk space free.

The files which controls this behavior is /etc/logrotate.conf. See the logrotate(8) man page for details on the syntax of this file.
Prev Up Next
Chapter 7. Managing Processes Home Chapter 8. Hardware Installation

May 9, 2008

GetOptions

http://www.perl.com/doc/manual/html/lib/Getopt/Long.html#EXAMPLES

GetOptions is called with a list of option-descriptions, each of which consists of two elements: the option specifier and the option linkage. The option specifier defines the name of the option and, optionally, the value it can take. The option linkage is usually a reference to a variable that will be set when the option is used. For example, the following call to GetOptions:

GetOptions("size=i" => \$offset);

will accept a command line option ``size'' that must have an integer value. With a command line of ``--size 24'' this will cause the variable $offset to get the value 24.

Alternatively, the first argument to GetOptions may be a reference to a HASH describing the linkage for the options, or an object whose class is based on a HASH. The following call is equivalent to the example above:

%optctl = ("size" => \$offset);
GetOptions(\%optctl, "size=i");

For the other options, the values for argument specifiers are:

!

Option does not take an argument and may be negated, i.e. prefixed by ``no''. E.g. ``foo!'' will allow --foo (with value 1) and -nofoo (with value 0). The option variable will be set to 1, or 0 if negated.

+

Option does not take an argument and will be incremented by 1 every time it appears on the command line. E.g. ``more+'', when used with --more --more --more, will set the option variable to 3 (provided it was 0 or undefined at first).

The + specifier is ignored if the option destination is not a SCALAR.

=s

Option takes a mandatory string argument. This string will be assigned to the option variable. Note that even if the string argument starts with - or --, it will not be considered an option on itself.

:s

Option takes an optional string argument. This string will be assigned to the option variable. If omitted, it will be assigned ``'' (an empty string). If the string argument starts with - or --, it will be considered an option on itself.

=i

Option takes a mandatory integer argument. This value will be assigned to the option variable. Note that the value may start with - to indicate a negative value.

:i

Option takes an optional integer argument. This value will be assigned to the option variable. If omitted, the value 0 will be assigned. Note that the value may start with - to indicate a negative value.

=f

Option takes a mandatory real number argument. This value will be assigned to the option variable. Note that the value may start with - to indicate a negative value.

:f

Option takes an optional real number argument. This value will be assigned to the option variable. If omitted, the value 0 will be assigned.

A lone dash - is considered an option, the corresponding option name is the empty string.

A double dash on itself -- signals end of the options list.

May 6, 2008

Cat examples

http://www.uwm.edu/cgi-bin/IMT/wwwman?topic=cat(1)&msection=

EXAMPLES

1. To display the file notes, enter:
cat notes

If the file is longer than one screenful, it scrolls by too quickly to
read. To display a file one page at a time, use the more command.

2. To concatenate several files, enter:
cat section1.1 section1.2 section1.3 > section1

This creates a file named section1 that is a copy of section1.1 fol-
lowed by section1.2 and section1.3.

3. To suppress error messages about files that do not exist, enter:
cat -s section2.1 section2.2 section2.3 > section2

If section2.1 does not exist, this command concatenates section2.2 and
section2.3. Note that the message goes to standard error, so it does
not appear in the output file. The result is the same if you do not
use the -s option, except that cat displays the error message:
cat: cannot open section2.1

You may want to suppress this message with the -s option when you use
the cat command in shell procedures.

4. To append one file to the end of another, enter:
cat section1.4 >> section1

The >> in this command specifies that a copy of section1.4 be added to
the end of section1. If you want to replace the file, use a single >
symbol.

5. To add text to the end of a file, enter:
cat >> notes
Get milk on the way home

Get milk on the way home is added to the end of notes. With this syn-
tax, the cat command does not display a prompt; it waits for you to
enter text. Press the End-of-File key sequence ( above) to
indicate you are finished.

6. To concatenate several files with text entered from the keyboard,
enter:
cat section3.1 - section3.3 > section3

This concatenates section3.1, text from the keyboard, and section3.3
to create the file section3.

7. To concatenate several files with output from another command, enter:
ls | cat section4.1 - > section4

This copies section4.1, and then the output of the ls command to the
file section4.

8. To get two pieces of input from the terminal (when standard input is a
terminal) with a single command invocation, enter:
cat start - middle - end > file1

If standard input is a regular file, however, the preceding command is
equivalent to the following:
cat start - middle /dev/null end > file1

This is because the entire contents of the file would be consumed by
cat the first time it saw - (dash) as a file argument. An End-of-File
condition would then be detected immediately when - (dash) appeared
the second time.

Apr 29, 2008

why snort content doesn't work

Posted by PanamaJax on November 17, 2005 20:47:23

win32 based...
running in packet mode works fine

in IDS mode:
alert tcp any any -> any any (msg:"TCP traffic";) works fine

alert tcp any any -> any any (flow: to_server, established; content: "test"; msg: "Saw Test";) - does absolutely nothing when 'test' is sent any method. Messenger service, http, telnet, ftp etc.

Based on the default virus.rules it should alert on an email beig sent with a .vbs attachment but it doesn't, through pop or exchange.

Everything seems to work fine, just not with a content statement. I've tried every permutation I can come up with, in/out bound, stateless etc etc etc and I don't get it.

If a user were to type 'google' in the web browser and there was an active rule:
alert tcp any any -> any any (flow: to_server, established; content: "google"; nocase; msg: "Saw google";)

shouldn't that trigger the alert?

Posted by brevizniak on November 26, 2005 18:36:50

yes it should work fine. There are some things to check.

- you actually see test in the traffic when running with -dve
- you can see both sides of hte connection
- the flow preprocessor is enabled
- The stream4 preprocessor is enabled
- you are not testing on the machine you are snorting from
This is because alot of systems have cards that compute a checksum in hardware so if you test and sniff on teh same machine snort may ignore the traffic because of a bad checksum.

Bro supported notice

Builtin Policy Files

Bro policy script is the basic analyzer used by Bro to determine what network events are alarm worthy. A policy can also specify what actions to take and how to report activities, as well as determine what activities to scrutinize. Bro uses policies to determine what activities to classify as hot, or questionable in intent. These hot network sessions can then be flagged, watched, or responded to via other policies or applications determined to be necessary, such as calling rst to reset a connection on the local side, or to add an IP address block to a main router's ACL (Access Control List). The policy files use the Bro scripting language, which is discussed in great detail in Reference Manual.

Policy files are loaded using an @load command. The semantics of @load are "load in this script if it hasn't already been loaded", so there is no harm in loading something in multiple policy scripts. The following policy scripts are included with Bro. The first set are all on by default, and the second group can be added by adding them to your site/brohost.bro policy file.

Bro Analyzers are described in detail in the Reference Manual. These policy files are loaded by default:
site defines local and neighbor networks from static config
alarm open logging file for alarm events
tcp initialize BPF filter for SYN/FIN/RST TCP packets
login rlogin/telnet analyzer (or to ensure they are disabled)
weird initialize generic mechanism for detecting unusual events
conn access and record connection events
hot defines certain forms of sensitive access
frag process TCP fragments
print-resources on exit, print resource usage information, useful for tuning
signatures the signature policy engine
scan generic scan detection mechanism
trw additional, more sensitive scan detection
http general http analyzer, low level of detail
http-request detailed analysis of http requests
http-reply detailed analysis of http replys
ftp FTP analysis
portmapper record and analyze RPC portmapper requests
smtp record and analyze email traffic
tftp identify and log TFTP sessions
worm flag HTTP-based worm sources such as Code Red
software track software versions; required for some signature matching
blaster looks for blaster worm
synflood looks for synflood attacks
stepping used to detect when someone logs into your site from an external net, and then soon logs into another site
reduce-memory sets shorter timeouts for saving state, thus saving memory. If your Bro is using < 50% of you RAM, try not loading this

These are not loaded by default:
Policy Description Why off by default
drop Include if site has ability to drop hostile remotes Turn on if needed
icmp icmp analysis CPU intensive and low payoff
dns DNS analysis CPU intensive and low payoff
ident ident program analyzer historical, no longer interesting
gnutella looks for hosts running Gnutella Turn this on if you wantto know about this
ssl ssl analyzer still experimental
ssh-stepping Detects stepping stones where both incoming and outgoing connections are ssh Possibly too CPU intensive (needs more testing)
analy Performs statistical analysis only used in off-line alalysis
backdoor Looks for backdoors only effective when also capturing bulk traffic
passwords Looks for clear text passwords may want to turn on if your site does not allow clear text passwords
file-flush Causes all log files to be flushed every N seconds may want to turn on if you are doing "real time" analysis

To modify which analyzers are loaded, edit or create a file in $BROHOME/site. If you write your own new custom analyzer, it goes in this directory too. To disable an analyzer, add "@unload policy.bro" to the beginning of the file $BROHOME/site/brohost.bro, before the line "@load brolite.bro". To add additional analyzers, add them @load them in $BROHOME/site/brohost.bro.
Notices

The primary output facility in Bro is called a Notice. The Bro distribution includes a number of standard of Notices, listed below. The table contains the name of the Notice, what Bro policy file generates it, and a short description of what the Notice is about.
Notice Policy Description
AckAboveHole weird Could mean packet drop; could also be a faulty TCP implementation
AddressDropIgnored scan A request to drop connectivity has been ignored ; (scan detected, but one of these flags is true: !can_drop_connectivity, or never_shut_down, or never_drop_nets )
AddressDropped scan Connectivity w/ given address has been dropped
AddressScan scan The source has scanned a number of addrs
BackscatterSeen scan Apparent flooding backscatter seen from source
ClearToEncrypted_SS stepping A stepping stone was seen in which the first part of the chain is a clear-text connection but the second part is encrypted. This often means that a password or passphrase has been exposed in the clear, and may also mean that the user has an incomplete notion that their connection is protected from eavesdropping.
ContentGap weird Data has sequence hole; perhaps due to filtering
CountSignature signatures Signature has triggered multiple times for a destination
DNS::DNS_MappingChanged DNS Some sort of change WRT previous Bro lookup
DNS::DNS_PTR_Scan dns Summary of a set of PTR lookups (automatically generated once/day when dns policy is loaded)
DroppedPackets netstats Number of packets dropped as reported by the packet filter
FTP::FTP_BadPort ftp Bad format in PORT/PASV;
FTP::FTP_ExcessiveFilename ftp Very long filename seen
FTP::FTP_PrivPort ftp Privileged port used in PORT/PASV
FTP::FTP_Sensitive ftp Sensitive connection (as defined in hot)
FTP::FTP_UnexpectedConn ftp FTP data transfer from unexpected src
HTTP::HTTP_SensitiveURI http shadow|netconfig)
HotEmailRecipient smtp Image:todo.pngFIXME Need Example, default = NULL
ICMP::ICMPAsymPayload icmp Payload in echo req-resp not the same
ICMP::ICMPConnectionPair icmp Too many ICMPs between hosts (default = 200)
IdentSensitiveID ident Sensitive username in Ident lookup
LocalWorm worm Worm seen in local host (searches for code red 1, code red 2, nimda, slammer)
LoginForbidden ButConfused login Interactive login seen using forbidden username, but the analyzer was confused in following the login dialog, so may be in error.
LoginForbiddenButConfused login Interactive login seen using forbidden username, but the analyzer was confused in following the login dialog, so may be in error.
Multiple SigResponders signatures host has triggered the same signature on multiple responders
MultipleSigResponders signatures host has triggered the same signature on multiple responders
MultipleSignatures signatures host has triggered many signatures
Multiple SigResponders signatures host has triggered the same signature on multiple responders
OutboundTFTP tftp outbound TFTP seen
PasswordGuessing scan source tried too many user/password combinations (default = 25)
PortScan scan the source has scanned a number of ports
RemoteWorm worm worm seen in remote host
Resolver Inconsistency dns the answer returned by a DNS server differs from one previously returned
ResolverInconsistency dns the answer returned by a DNS server differs from one previously returned
ResourceSummary print-resources prints Bro resource usage
Retransmission Inconsistency weird possible evasion; usually just bad TCP implementation
RetransmissionInconsistency weird possible evasion; usually just bad TCP implementation
SSL_SessConIncon ssl session data not consistent with connection
SSL_X509Violation ssl blanket X509 error
ScanSummary scan a summary of scanning activity, output once / day
SensitiveConnection conn connection marked "hot", See: Reference Manual section on hot IDs for more information.
SensitiveDNS_Lookup dns DNS lookup of sensitive hostname/addr; default list of sensitive hosts = NULL
SensitiveLogin login interactive login using sensitive username (defined in 'hot')
Sensitive PortmapperAccess portmapper the given combination of the service looked up via the pormapper, the host requesting the lookup, and the host from which it's requiesting it is deemed sensitive
SensitivePortmapperAccess portmapper the given combination of the service looked up via the pormapper, the host requesting the lookup, and the host from which it's requiesting it is deemed sensitive
SensitiveSignature signatures generic for alarm-worthy
SensitiveUsername InPassword login During a login dialog, a sensitive username (e.g., "rewt") was seen in the user's password. This is reported as a notice because it could be that the login analyzer didn't track the authentication dialog correctly, and in fact what it thinks is the user's password is instead the user's username.
SensitiveUsernameInPassword login During a login dialog, a sensitive username (e.g., "rewt") was seen in the user's password. This is reported as a notice because it could be that the login analyzer didn't track the authentication dialog correctly, and in fact what it thinks is the user's password is instead the user's username.
SignatureSummary signatures summarize number of times a host triggered a signature (default = 1/day)
SynFloodEnd synflood end of syn-flood against a certain victim. A syn-flood is defined to be more than SYNFLOOD_THRESHOLD (default = 15000) new connectionshave been reported within the last SYNFLOOD_INTERVAL (default = 60 seconds) for a certain IP.
SynFloodStart synflood start of syn-flood against a certain victim
SynFloodStatus synflood report of ongoing syn-flood
TRWAddressScan trw source flagged as scanner by TRW algorithm
TRWScanSummary trw summary of scanning activities reported by TRW
Terminating Connection conn "rst" command sent to connection origin, connection terminated, triggered in the following policies: ftp and login: forbidden user id, hot (connection from host with spoofed IP address)
TerminatingConnection conn "rst" command sent to connection origin, connection terminated, triggered in the following policies: ftp and login: forbidden user id, hot (connection from host with spoofed IP address?)
W32B_SourceLocal blaster report a local W32.Blaster-infected host
W32B_SourceRemote blaster report a remote W32.Blaster-infected host
WeirdActivity Weird generic unusual, alarm-worthy activity

Note that some of the Notice names start with "ModuleName::" (e.g.: FTP::FTP_BadPort) and some do not. This is becuase not all of the Bro Analyzers have been converted to use the [1] Modules facility} yet. Eventually all notices will start with "ModuleName::".

To get a list of all Notices that your particular Bro configuration might generate, you can type:

sh . $BROHOME/etc/bro.cfg; bro -z notice $BRO_HOSTNAME.bro

Apr 28, 2008

Using a 'OR' condition in Signature payloads

Re: Using a 'OR' condition in Signature payloads: msg#00000
Subject: Re: Using a 'OR' condition in Signature payloads

On Tue, Oct 31, 2006 at 00:32 -0800, Vern Paxson wrote:

> I believe what's going on is that "payload" is matching the TCP *byte-stream*
> rather than individual packets. As such, there's just one match to the
> pattern, since the .*'s eat up everything else in the byte-stream.

That's right.

> There's an option to just match packet payloads, but I don't recall what
> it is.

No, there is no option (UDP is matched packet-wise but even for UDP
Bro reports each signature-match only once per UDP flow).

Robin

How does Bro capture the traffic of ftp data connection

On Thu, Mar 15, 2007 at 12:01 +0800, you wrote:

> So how does it dynamically add the filter string to capture the
> temporary traffic?

It doesn't. Dynamically changing the BPF filter is too expensive as
it would need to be recompiled every time (and the filter would
quickly get huge).

If you want Bro to analyze the content of ftp-data sessions, you
need to manually override the pcap filter to include all packets,
e.g., by running with "-f tcp".

Robin

--
Robin Sommer * Phone +1 (510) 931-5555 * robin at icir.org
LBNL/ICSI * Fax +1 (510) 666-2956 * www.icir.org

Find Man

List all files that belong to the user Simon:

find . -user Simon

List all the directory and sub-directory names:

find . -type d

List all files in those sub-directories (but not the directory names)

find . -type f

List all the file links:

find . -type l

List all files (and subdirectories) in your home directory:

find $HOME

Find files that are over a gigabyte in size:

find ~/Movies -size +1024M

Find files have been modified within the last day:

find ~/Movies -mtime -1

Find files have been modified within the last 30 minutes:

find ~/Movies -mmin -30

Find .doc files that also start with 'questionnaire' (AND)

find . -name '*.doc' -name questionnaire*

List all files beginning with 'memo' and owned by Simon (AND)

find . -name 'memo*' -user Simon

Find .doc files that do NOT start with 'Accounts' (NOT)

find . -name '*.doc' ! -name Accounts*

Search for files which have read and write permission for their owner,
and group, but which the rest of the world can read but not write to.

find . -perm 664

Files which meet these criteria but have other permissions bits set
(for example if someone can execute the file) will not be matched.

Search for files which have read and write permission for their owner,
and group, but which the rest of the world can read but not write to,
without regard to the presence of any extra permission bits
(for example the executable bit).

find . -perm -664

This will match a file which has mode 0777, for example.

Search for files which are writeable by somebody (their owner, or their group, or anybody else).

find . -perm +222
or
find . -perm +g+w,o+w
or
find . -perm +g=w,o=w

All three of these commands do the same thing, but the first one uses
the octal representation of the file mode, and the others use the symbolic form.
The files don't have to be writeable by both the owner and group to be matched; either will do.

Search for files which are writeable by both their owner and their group:

find . -perm -022
or
find . -perm -g+w,o+w

"Instead of getting married again, I'm going to find a woman I don't like and just give her a house." - Lewis Grizzard

Related Linux Bash commands:

fnmatch - Filename match
findutils documentation - 'Finding Files' doc with more detail on security considerations
grep - Search file(s) for lines that match a given pattern
locate - Find files - simple but fast
gawk - Find and Replace text within file(s)
xargs - Execute utility, passing constructed argument list(s)

Equivalent Windows XP command:
DIR /b /s - Display a list of files and (sub)folders

Feb 29, 2008

基于Struts-Spring-Hibernate的Java应用开发

http://book.csdn.net/bookfiles/111/1001113461.shtml

Spring原理简介

在用ssh(Struts+Spring+Hibernate)实现的mvc模式中Spring是作为连接Struts和Hibernate的控制层。与Spring框架相关的概念有以下：轻量级：轻量级是针对重量级容器(EJB)来说的，Spring的核心包不到1M大小，而使用Spring的核心包所需的资源也很小，所以可以在小型设备中使用。非侵入性：所有的框架都是提供大量的功能公用户去使用，从而简化开发时间和成本，但由于大量的使用了框架的API，使应用程序和框架发生了大量的依赖性，无法从框架中独立出来，更加无法使程序组件在其他程序中使用，这样的框架叫做入侵式的框架，而Spring目标是一个非入侵式的服务框架。容器：容器就是一个帮助你把原来自行编写程序去管理对象关系的工作转移给容器来做。Spring提供了容器功能，容器可以管理对象的生命周期、对象与对象之间的关系、你可以通过编写XML来设置对象关系和初始值，这样容器在启动之后，所有的对象都直接可以使用，不用编写任何编码来产生对象。 IOC/DI：Spring最核心的概念就是IOC(反转控制)，而他的另一个名字就是DI(依赖注入)；使用Spring，你不必在程序中维护对象的依赖关系，只要在xml中设定即可，Spring容器会自己根据相关的配置去产生他们之间的关系，所有的关系都是都是在容器运行的时候注入的，而他们本身是没有关系的。打个比方：比如张三和李四，之前是没有任何关系的两个对象，但当他俩进入班级这个容器中后，班级这个容器就将他俩赋予了同学的关系。这样的做法就是用容器去赋予对象之间的关系，而不是对象本身之间来创建关系。这样做的好处显然实现了松偶合。 AOP(Aspect Oriented Programming面向切面/方面编程)：Spring最被人重视的另一个方面就是对AOP的支持，AOP是Spring支持的一个子容器。在一个服务流程中插入与业务逻辑无关的系统服务逻辑(如：Logging登录、Security安全等)，而把这些独立出来设计成一个对象，这样的对象称为Aspect。打个比方：做一次环球旅行，在旅行途中要经过若干国家的海关关口办理出入境手续，这样的一个一个的关口就是整个旅行流程中的一个一个的Aspect。 demo：(仅仅解释了什么是依赖注入DI或者叫反转控制IOC) 1、首先看一个原来的操作 //User.java package org.myspring; public class User { private String username; private int age; public int getAge() { return age; } public void setAge(int age) { this.age = age; } public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } } //Test.java package org.myspring; public class Test { public static void main(String[] args) { User user=new User(); user.setUsername("zhangsan"); System.out.println(user.getUsername()); } } 以上是原始的做法，产生的问题是：如果想要把zhangsan改为lisi则需要在Test类中修改代码，这样是一种紧耦合，改动一个类就牵扯到另外一个类。松耦合的情况是User.java和Test.java这两个类都不需要改动，就能实现输出不同username属性的效果，这就需要加入Spring的IOC/DI机制。方法如下： 2、MyEclipse->Add Spring Capabilities...->仅加入核心包即可，这样就生成了applicationContext.xml配置文件 3、修改applicationContext.xml：在xml文件的编辑页中右键->Spring->New Bean，在弹出窗口中进行如下设置：

zhangsan 25 4、新的测试类 //Test.java package org.myspring; import org.springframework.context.ApplicationContext; import org.springframework.context.support.FileSystemXmlApplicationContext; public class Test { public static void main(String[] args) { ApplicationContext c FileSystemXmlApplicationContext("src/org/myspring/applicationContext.xml"); User user=(User)context.getBean("user"); System.out.println("name:"+user.getUsername()+"; age:"+user.getAge()); } } 注意：在上面的代码中context.getBean()返回的是一个Object对象，需要进行相应的类对象的转换。在代码中没有出现用new来实例化对象的语句，实现了Test类跟User类的松耦合。对象的实例化都在xml配置文件中实现了。
该文章由作者使用maikr blog备份工具于 2007-08-17

Feb 22, 2008

CGI 中文教程

http://www.lzu.edu.cn/netteach/jiaochen/cgi/default.htm

perl 中文教程

http://net.pku.edu.cn/~yhf/perl/perl.html

Feb 13, 2008

Private constructor

Private constructor
Topic:

Private constructors prevent a class from being explicitly instantiated by callers.

There are some common cases where a private constructor can be useful :

* classes containing only static utility methods
* classes containing only constants
* type safe enumerations
* singletons

These examples fall into two categories.
Object construction is entirely forbidden
No objects can be constructed, either by the caller or by the native class. This is only suitable for classes that offer only static members to the caller.

In these cases, the lack of an accessbile constructor says to the caller : "There are no use cases for this class where you need to build an object. You can only use static items. I am preventing you from even trying to build an object of this class." See class for constants for an illustration.

If the programmer does not provide a constructor for a class, then the system will always provide a default, public no-argument constructor. To disable this default constructor, simply add a private no-argument constructor to the class. This private constructor may be empty. Somewhat paradoxically, creation of objects by the caller is in effect disabled by the addition of this private no-argument constructor.
Object construction is private only
Objects can be constructed, but only internally. For some reason, a class needs to prevent the caller from creating objects.

In the case of a singleton, the policy is that only one object of that class is supposed to exist. Creation of multiple objects of that class is forbidden.

In the case of JDK 1.4 type-safe enumerations, more than one object can be created, but again, there is a limitation. Only a specific set of objects is permitted, one for each element of the enumeration. (For JDK 1.5 enums, the creation of such objects is done implicitly by the Java language, not explicitly by the application programmer.)

See Also :
Type-Safe Enumerations
Class for constants
Factory methods
Singleton
Would you use this technique?
Yes No Undecided
Add your comment to this Topic :

Comment:

© 2008 Hirondelle Systems | Source Code | Contact | License | Quotes | RSS
Individual code snippets can be used under this license - Last updated on Dec 20, 2007.
Over 78,000 unique IPs (and 164,000 sessions) last month - Built with WEB4J.
- In Memoriam : Bill Dirani -

Java Garbage Collection Interview Questions

Java Garbage Collection Interview Questions
Explain garbage collection?

Or

How you can force the garbage collection?

Or

What is the purpose of garbage collection in Java, and when is it used?

Or

What is Garbage Collection and how to call it explicitly?

Or

Explain Garbage collection mechanism in Java?

Garbage collection is one of the most important features of Java. The purpose of garbage collection is to identify and discard objects that are no longer needed by a program so that their resources can be reclaimed and reused. A Java object is subject to garbage collection when it becomes unreachable to the program in which it is used. Garbage collection is also called automatic memory management as JVM automatically removes the unused variables/objects (value is null) from the memory. Every class inherits finalize() method from java.lang.Object, the finalize() method is called by garbage collector when it determines no more references to the object exists. In Java, it is good idea to explicitly assign null into a variable when no more in use. In Java on calling System.gc() and Runtime.gc(), JVM tries to recycle the unused objects, but there is no guarantee when all the objects will garbage collected. Garbage collection is an automatic process and can't be forced. There is no guarantee that Garbage collection will start immediately upon request of System.gc().

What kind of thread is the Garbage collector thread?

It is a daemon thread.

Can an object’s finalize() method be invoked while it is reachable?

An object’s finalize() method cannot be invoked by the garbage collector while the object is still reachable. However, an object’s finalize() method may be invoked by other objects.

Does garbage collection guarantee that a program will not run out of memory?

Garbage collection does not guarantee that a program will not run out of memory. It is possible for programs to use up memory resources faster than they are garbage collected. It is also possible for programs to create objects that are not subject to garbage collection.

What is the purpose of finalization?

The purpose of finalization is to give an unreachable object the opportunity to perform any cleanup, before the object gets garbage collected. For example, closing an opened database Connection.

If an object is garbage collected, can it become reachable again?

Once an object is garbage collected, It can no longer become reachable again.

Feb 11, 2008

auto_ptr

// 示例 2: 使用一个 auto_ptr
//
void g()
{
T* pt1 = new T;
// 现在，我们有了一个分配好的对象

// 将所有权传给了一个auto_ptr对象
auto_ptr pt2( pt1 );

// 使用auto_ptr就像我们以前使用简单指针一样
*pt2 = 12; // 就像 "*pt1 = 12;"
pt2->SomeFunc(); // 就像 "pt1->SomeFunc();"

// 用get()来获得指针的值
assert( pt1 == pt2.get() );

// 用release()来撤销所有权
T* pt3 = pt2.release();

// 自己删除这个对象，因为现在
// 没有任何auto_ptr拥有这个对象
delete pt3;

} // pt2不再拥有任何指针，所以不要
// 试图删除它...ok，不要重复删除

最后，我们可以使用auto_ptr的reset()函数来重置auto_ptr使之拥有另一个对象。如果这个auto_ptr已经拥有了一个对象，那么，它会先删除已经拥有的对象，因此调用reset()就如同销毁这个auto_ptr，然后新建一个并拥有一个新对象：

// 示例 3: 使用reset()
//
void h()
{
auto_ptr pt( new T(1) );

pt.reset( new T(2) );
// 删除由"new T(1)"分配出来的第一个T

} // 最后，pt出了作用域，
// 第二个T也被删除了

Feb 10, 2008

对齐

【答疑】字长对齐带来的效率提升

经常看到有人问起对齐有什么作用之类问题
因为今天和同事谈到了ARM平台下数据总线宽度及对齐方式对程序效率的影响问题
在定义结构数据类型时，为了提高系统效率，要注意字长对齐原则。
正好有点感触给大家谈谈本人水平有限的很有什么问题请朋友指正:
本文主要给大家解释下所谓的对齐到底是什么?怎么对齐?为什么会对齐或者说对齐带来什么样的效率差异?

1．
先看下面的例子：
#include
#pragma pack(4)
struct A
{
char a;
int b;
};
#pragma pack()

#pragma pack(1)
struct B
{
char a;
int b;
};
#pragma pack()

int main()
{

A a;
cout<< sizeof(a);

//define b;
cout<}

默认的vc我记得是4字节对齐ADS下是一字节对齐
因为是c/c++社区大家对PC比较熟悉我就谈PC下的对齐
PC下设计放的太长时间的有错误就别客气直接说

大家可以看到在ms的vc下按4字节对齐和1字节对齐的结果是截然不同的分别为8和5
为什么会有这样的结果呢？这就是x86上字节对齐的作用。为了加快程序执行的速度，
一些体系结构以对齐的方式设计，通常以字长作为对齐边界。对于一些结构体变量，
整个结构要对齐在内部成员变量最大的对齐边界，如A，整个结构以4为对齐边界，所以sizeof(a)为8，而不是5。
如果是原始我们概念下的的A中的成员将会一个挨一个存储应该只有char+int只有5个字节
这个差异就是由于对齐导致的
显然我们可以看到 A的对齐要比B浪费3个字节的存储空间
那为什么还要采取对齐呢？
那是因为体系结构的对齐和不对齐，是在时间和空间上的一个权衡。
字节对齐节省了时间。应该是设计者考虑用空间换取时间。
为什么说对齐会提高效率呢节省时间？我想大家要理解的重点之重点就在这里了
在我们常用的PC下总线宽度是32位
1.如果是总线宽度对齐的话
那么所有读写操作都是获取一个<=32位数据可以一次保证在数据总线传输完毕
没有任何的额外消耗
|1|2|3|4|5|6|7|8|
从1开始这里是a的起始位置,5起始为b的位置访问的时候
如果访问a一次在总线传输8位其他24位无效的
访问b时则一次在总线上传输32完成
读写均是一次完整
插叙一下读操作先要将读地址放到地址总线上然后下个时钟周期再从外部
存储器接口上读回数据通过数据总线返回需要两个周期
而写操作一次将地址及数据写入相应总线就完成了
读操作要比写操作慢一半

2.我们看访问数据时如果不对齐地址的情况
|1|2|3|4|5|6|7|8|
此时a的地址没变还在1而因为是不对齐则b的位置就在2处
这时访问就带来效率上问题访问a时没问题还是读会一个字节
但是2处地址因为不是总线宽度对齐一般的CPU在此地址操作将产生error
如sparc，MIPS。它们在硬件的设计上就强制性的要求对齐。在不对齐的地址上肯定发生错误
但是x86是支持非对齐访问的
它通过多次访问来拼接得到的结果,具体做法就是从1地址处先读回后三字节234 暂存起来
然后再由5地址处读回一个字节5 与234进行拼接组成一个完整的int也就是b返回
大家看看如此的操作带来的消耗多了不止三倍很明显在字长对齐时效率要高许多
淡然这种效率仅仅是访问多字节带来的如果还是进行的byte操作那效率差不了多少

目前的开发普遍比较重视性能，所以对齐的问题，有2种不同的处理方法：
1）有一种使用空间换时间做法是显式的插入reserved成员：
struct A{
char a;
char reserved1[3];//使用空间换时间
int b;

}a;
2）随便怎么写，一切交给编译器自动对齐。
还有一种将逻辑相关的数据放在一起定义

代码中关于对齐的隐患，很多是隐式的。比如在强制类型转换的时候。下面举个例子：
unsigned int i = 0x12345678;
unsigned char *p=NULL;
unsigned short *p1=NULL;

p=&i;
*p=0x00;
p1=(unsigned short *)(p+1);
*p1=0x0000;
最后两句代码，从奇数边界去访问unsignedshort型变量，显然不符合对齐的规定。
在x86上，类似的操作只会影响效率，但是在MIPS或者sparc上，可能就是一个error

Array vs. List

Memory allocation
Most often, arrays are static, with their size defined upon creation. Additionally, the memory allocated for arrays is contiguous. Therefore, they are typically used when the maximum number of elements is known at design time. The drawback to this approach is that large arrays require large amounts of memory, which may go unused, especially those designed for a maximum number of elements that will often not approach their capacity. And on some platforms, such as certain handheld devices that use older operating systems, memory constraints could limit the size of the arrays you can use.

On the other hand, linked lists are usually dynamic. They can grow and shrink as needed at runtime. Due to this trait, linked lists are more appealing when the number of elements is unknown. Also, the linked list memory is allocated on an element-by-element basis and thus is rarely contiguous. The downside to being able to deal with uncertainty is that adding and deleting elements to linked lists requires more overhead than merely assigning values to preallocated array elements. But the only limits on how much memory may be allocated to a linked list are imposed by the size of the memory heap used by the application.

Accessing elements
The elements within arrays are accessed by their indices. Thus, data access is easy and fast if you know which element to retrieve. If you don’t know the index of the element needed, but the elements are sorted based on some key value, you can perform highly efficient search algorithms to locate specific elements. These algorithms allow only a minimal number of comparisons to locate a unique element. There are also several established and efficient algorithms for sorting and merging arrays. However, arrays are inefficient when the ordering of their elements is likely to change. Maintaining a sorted array upon element deletion or insertion could require the transfer of every element in the array.

Linked lists are usually traversed element by element until a match is found. Because the memory for linked lists is not guaranteed to be contiguous, this list traversal is the only method for searching the list (without involving the use of other data structures as indices). The upside of noncontiguous memory is that reordering the list simply involves manipulating the links. Insertion or deletion of an element requires only a couple of pointer modifications. The transfer of the actual data isn’t required at all.

Breaking the rules
Using language-specific constructs may allow for the best of both worlds. With C, pointers to variables or objects can be used as arrays of the corresponding type if they are pointed to the first element in an allocated array. This allows a pointer to be used as an array, but when resizing is necessary, the realloc() function allocates a new block of memory and transfers all existing elements to the new location. This technique allows for dynamic resizing of an array while maintaining contiguous memory and element indexing.

With Java, the provided linked-list class offers an indexed linked list that supports all of the standard list methods (top, next, previous, etc.) as well as indexed operation. The indexOf(), get(), and set() methods allow array-like access to the elements of the list. Additionally, Java provides an ArrayList class that represents a resizable-array implementation of the list class. Both of these classes support methods for returning true arrays from their list representations.

Programming languages continue to become more advanced, and there is less distinction between the various types of data implementations as their structures expand to include the strengths and correct the deficiencies found in the standard models. However, it will always be important to remember where these structures originated and how they are still used within the newer classes. Although these newer implementations hide the details from the programmer, the computational overhead and resources required do not change.

Making the decision
If your data is best represented using a multidimensional structure, or the number of elements is known in advance and will remain consistent, an array is best. If your data is easily represented in one dimension, and the number of elements is unknown or is expected to change often throughout the operation of your program, a linked list is more efficient.

If your data will be searched and accessed often but will change infrequently, the array offers the least overhead for your expected operations. If you expect to be regularly adding or subtracting elements, especially if you need to maintain a sorted order, the versatility of the linked list will be of greater benefit.

In the end, your data and your operational needs will decide which structure is best for your application.

Stack Frames

Class 28: Stack Frames (1)

Back to Type Checking (3): Conclusion. On to Stack Frames (2).

Held: Wednesday, 7 April 2004

Overview:

* About the back end
* Where do variables and values go?
* Stacks and stack frames
* Function and procedure calls
* Non-local variables

Moving to the Back End

* We've covered most of the front-end details: lexing, parsing, and basic semantic analysis.
* Now it's time to move on to the back end of the parser. That is, the generation of code (perhaps intermediate code, perhaps assembly code) from the annotated parse tree.
* We'll look at the issue in steps.
o We'll consider some general issues (run-time environment, assembly code) this week.
o We'll start looking at particular translations starting next week.

Storing Variables in Memory

* A first consideration is how to handle the storage of variables and parameters in memory.
* As you know, in most modern languages it is possible to call procedures recursively and create new instantiations of the local variables for those procedures.
* In addition, when a function exits you no longer need access to its local variables.
* However, in languages that support the dynamic allocation of memory (e.g., most object-oriented languages), there are also some values that live beyond the function that created them.
* Typically, values that are only active during the lifetime of a procedure are allocated on a stack and values that are independent of procedure lifetime are allocated on a heap.
o Most languages assume one stack and one heap.
* In modern architectures, some variables should be stored in registers to improve performance.

The Stack

* At the machine level, the stack is simply an area of memory that is allocated and deallocated in a stack-like manner.
* Typically, the stack starts at the high end of memory and the heap starts at the low end of memory.
o This design makes it possible to delay the decision of how much memory to use for heap and how much for stack until run time (i.e., you can grow either heap or stack until the two cross).
o This design suggests that stacks grown downward and shrink upwards, like bungie cords.
* A designated register called the stack pointer keeps track of the end of the stack.

The Heap

* The heap is an even more amorphous area of memory. Parts of the area are allocated by explicit allocate calls (e.g., new) although the determination of which area to use is up to the system rather than the program.
* In many languages (including Pascal) programmers must manage the memory they allocate, freeing it when no more memory is available.
o The system still must do some behind-the-scences work in keeping track of which memory the programmer has designated as available and free.
* In some more modern languages, the system is in charge of keeping track of which memory is in use and freeing unused memory "automatically". This technique is commonly referred to as gargage collection.

Stack Frames

* Since a function will often require space for many variables (parameters, local variables, temporaries, etc.) it is more convenient to allocate all of that space at once.
o This means that we should predetermine the maximum amount of space a function will use.
o As long as we've determined that space, we might as well lay out the data in that space.
* The organization of local data for the invocation of a function is typically called a stack frame.
* A frame pointer indicates the beginning of the frame.
o Why have both frame pointer and stack pointer? At times, you need to keep track of other frames.
* What goes in a frame (or in accompanying registers)?
o Any local variables
o Parameters (subject to the caveats below)
o The return address (what statement to branch to when the method exits)
o Any temporaries
o Saved registers
o Space for the return value
o Other things ...

Function Calls

* How do we call a function?
* The caller places some of the formal parameters on the stack (often, in its own stack frame).
* The caller places some of the formal parameters in registers.
o If those registers are currently in use, the caller must store their current values on the stack.
* The caller places a return address and static link on the stack (often, in the next stack frame).
* The caller branches to the beginning of the called function.
* The called function allocates a new stack frame, updating the stack pointer.
* The called function executes.
* The called function stores its result in a register (or on the stack, in a more primitive implementation).
* The called function deallocates its stack frame.
* The called function branches back to the return address.
* The caller makes use of the result value.
* The caller restores any registers necessary.

Accessing Non-Local Variables

* In nested languages, like Pascal, it is possible to refer to a variable from a non-local scope. How do you get access to that variable?
* One possibility is to have every frame include a pointer to the frame of the enclosing scope (not necessarily the caller). This means that you have to trace backward an appropriate amount, but that amount can be computed at compile time. Such a pointer is typically called a static link.
* Another possibility is to use a global display which maps each scope to the appropriate stack frame.
* A third possibility is to pass all of the variables to the function and restore them afterwards. This can be particularly difficult to implement.

函数调用过程

一.环境：

x86/WinXP/VC 6.0

二.用例：
int swap(int a, int b)
{
int v;
v = a;
a = b;
b = v;
return v;
}

void main(void)
{
int a = 7;
int b = 10;
int c = 0;
c = swap(a,b);
return;
}

三.分析：
1: int swap(int a, int b)
2: {
00401020 push ebp
00401021 mov ebp,esp
00401023 sub esp,44h
00401026 push ebx
00401027 push esi
00401028 push edi
00401029 lea edi,[ebp-44h]
0040102C mov ecx,11h
00401031 mov eax,0CCCCCCCCh
00401036 rep stos dword ptr [edi]
3: int v;
4: v = a;
00401038 mov eax,dword ptr [ebp+8]
0040103B mov dword ptr [ebp-4],eax
5: a = b;
0040103E mov ecx,dword ptr [ebp+0Ch]
00401041 mov dword ptr [ebp+8],ecx
6: b = v;
00401044 mov edx,dword ptr [ebp-4]
00401047 mov dword ptr [ebp+0Ch],edx
7: return v;
0040104A mov eax,dword ptr [ebp-4]
8: }
0040104D pop edi
0040104E pop esi
0040104F pop ebx
00401050 mov esp,ebp
00401052 pop ebp
00401053 ret

swap函数内部参数处理:
1.将基址指针EBP压栈;
2.将堆栈指针ESP拷贝一份到EBP中;
3.将ESP值减0x44(为了将这68个字节填充为0xCC);
4.将EBX/ESI/EDI压栈;
5.将EBP-0x44的地址偏移量存到目标地址指针EDI;
6.将计数器ECX置为0x11(即17个整数);
7.将EDI所开始的大小为17的内存空间填充为0xCCCCCCCC;

8.取出堆栈中被压的a的值(EBP+8所存的值)放到EAX中;
9.将EAX中的值(参数a的值)赋给到v;
10.取出堆栈中被压的b的值(EBP+0CH所存的值)放到ECX中;
11.将ECX中的值(参数b的值)赋给到a;
12.取出堆栈中被压的b的值(EBP-4所存的值)放到EDX中;
13.将EDX中的值(变量v的值)赋给到a;

14.将返回值(v的值)拷贝到EAX中;

15.恢复EDI/ESI/EBX;
16.恢复ESP为EBP(即进入swap时的ESP);
17.恢复EBP为调用swap前的值;
18.swap函数返回;

10: void main(void)
11: {
00401070 push ebp
00401071 mov ebp,esp
00401073 sub esp,4Ch
00401076 push ebx
00401077 push esi
00401078 push edi
00401079 lea edi,[ebp-4Ch]
0040107C mov ecx,13h
00401081 mov eax,0CCCCCCCCh
00401086 rep stos dword ptr [edi]
12: int a = 7;
00401088 mov dword ptr [ebp-4],7
13: int b = 10;
0040108F mov dword ptr [ebp-8],0Ah
14: int c = 0;
00401096 mov dword ptr [ebp-0Ch],0
15: c = swap(a,b);
0040109D mov eax,dword ptr [ebp-8]
004010A0 push eax
004010A1 mov ecx,dword ptr [ebp-4]
004010A4 push ecx
004010A5 call @ILT+5(_swap) (0040100a)
004010AA add esp,8
004010AD mov dword ptr [ebp-0Ch],eax
16: return;
17: }
004010B0 pop edi
004010B1 pop esi
004010B2 pop ebx
004010B3 add esp,4Ch
004010B6 cmp ebp,esp
004010B8 call __chkesp (004010e0)
004010BD mov esp,ebp
004010BF pop ebp
004010C0 ret

15: c = swap(a,b);
0040109D mov eax,dword ptr [ebp-8]
004010A0 push eax
004010A1 mov ecx,dword ptr [ebp-4]
004010A4 push ecx
004010A5 call @ILT+5(_swap) (0040100a)
004010AA add esp,8
004010AD mov dword ptr [ebp-0Ch],eax
分析函数swap调用过程:
1.将参数值b拷到EAX;
2.将EAX值压栈;
3.将参数值a拷到ECX;
4.将ECX值压栈;
5.CALL swap函数
6.将堆栈指针加8，丢弃堆栈中a,b的值;
7.将EAX中的返回值拷贝到变量c;

调用swap时的堆栈情况:

0012FEC8 | EDI |<-- ESP (执行到 v = a语句时堆栈指针)
0012FECC | ESI |
0012FED0 | EBX |
0012FED4 | 0xCCCCCCCC |
| . |
| . |
| . |
0012FF14 | 0xCCCCCCCC |
0012FF18 | EBP |<-- EBP (执行到 v = a语句时堆栈指针)
0012FF1C | 0x004010AA | 执行完调用swap后的返回地址
0012FF20 | ECX | a的值
0012FF24 | EAX | b的值

四.总结：

1.实参压栈过程是从右至左，然后压入函数调用后的返回地址;
2.被调用的函数从堆栈中取实参值;
3.如果采用编译器进行优化,那么被调用的函数内部是从寄存器还是从堆栈中取实参值？具体情况根据CPU架构和参数个数有关(可以继续分析哦!);
4.如果函数参数个数超过较多(如果超过5个参数),函数参数如何传递呢？
5.函数的返回值一般会存在一个寄存器,如EAX.

根据成员变量的地址推算出结构体变量的地址

根据成员变量的地址推算出结构体变量的地址2006-12-26 14:47我们在书写C程序的时候，有时候需要根据结构体成员变量的地址，得到结构体的地址，特别是我们想用C来实现C++的继承特性的时候。
我们对问题的分析如下：

输入：一个结构体定义type，这个结构体中某个成员变量的名字member以及它的地址ptr
输出：包含此成员变量的结构体的地址
为了便于分析，我们给出一个实例来说明
struct father_t {
int a;
char *b;
double c;
}f;
char *ptr = &(f.b);
//而不是 ptr = f.b; 这里ptr是b的地址，而不是它指向的地址。
根据C语言对struct类型的存储特性，我们可以画这么一个图示：

通过分析图示，我们可以看出，我们只需要把当前知道的成员变量的地址ptr，减去它在结构体当中相对偏移4就的到了结构体的地址（ptr-4)。
在linux当中对此有一个很好的宏可以使用，叫做 container_of，放在 linux/kernel.h当中。它的定义如下所示：
/**
* container_of - cast a member of a structure out to the containing structure
*
* @ptr: the pointer to the member.
* @type: the type of the container struct this is embedded in.
* @member: the name of the member within the struct.
*
*/
#define container_of(ptr, type, member) ({
const typeof( ((type *)0)->member ) *__mptr = (ptr);
(type *)( (char *)__mptr - offsetof(type,member) );})

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
对上面的定义，分析如下：

(type *)0->member为设计一个type类型的结构体，起始地址为0，编译器将结构体的起始的地址加上此结构体成员变量的偏移得到此结构体成员变量的偏移地址，由于结构体起始地址为0，所以此结构体成员变量的偏移地址就等于其成员变量在结构体内的距离结构体开始部分的偏移量。即：&((type *)0->member)就是取出其成员变量的偏移地址。而其等于其在结构体内的偏移量:即为：(size_t)(& ((type *)0)->member)经过size_t的强制类型转换后，其数值为结构体内的偏移量。该偏移量这里由offsetof()求出。
typeof(((type *)0)->member)为取出member成员的变量类型。用其定义__mptr指针。ptr为指向该成员变量的指针。__mptr为member数据类型的常量指针，其指向ptr所指向的变量处。
(char*)__mptr转换为字节型指针。(char*)__mptr - offsetof(type,member))用来求出结构体起始地址（为char *型指针），然后(type*)(char*)__mptr - offsetof(type,member))在(type *)作用下进行将字节型的结构体起始指针转换为type *型的结构体起始指针。
这就是从结构体某成员变量指针来求出该结构体的首指针。指针类型从结构体某成员变量类型转换为该结构体类型。

function pointer

// is a pointer to a function which returns an int and takes a float and two char
void PassPtr(int (*pt2Func)(float, char, char))
{
int result = (*pt2Func)(12, 'a', 'b'); // call using function pointer
cout << result << endl;
}

// execute example code - 'DoIt' is a suitable function like defined above in 2.1-4
void Pass_A_Function_Pointer()
{
cout << endl << "Executing 'Pass_A_Function_Pointer'" << endl;
PassPtr(&DoIt);
}

Bit Fields

Bit Fields allow the packing of data in a structure. This is especially useful when memory or data storage is at a premium. Typical examples:

* Packing several objects into a machine word. e.g. 1 bit flags can be compacted -- Symbol tables in compilers.
* Reading external file formats -- non-standard file formats could be read in. E.g. 9 bit integers.

C lets us do this in a structure definition by putting :bit length after the variable. i.e.

struct packed_struct {
unsigned int f1:1;
unsigned int f2:1;
unsigned int f3:1;
unsigned int f4:1;
unsigned int type:4;
unsigned int funny_int:9;
} pack;

Here the packed_struct contains 6 members: Four 1 bit flags f1..f3, a 4 bit type and a 9 bit funny_int.

C automatically packs the above bit fields as compactly as possible, provided that the maximum length of the field is less than or equal to the integer word length of the computer. If this is not the case then some compilers may allow memory overlap for the fields whilst other would store the next field in the next word (see comments on bit fiels portability below).

Access members as usual via:

pack.type = 7;

NOTE:

* Only n lower bits will be assigned to an n bit number. So type cannot take values larger than 15 (4 bits long).
* Bit fields are always converted to integer type for computation.
* You are allowed to mix ``normal'' types with bit fields.
* The unsigned definition is important - ensures that no bits are used as a $\pm$ flag.

End-Of-File (EOF) when reading from cin

End-Of-File (EOF) when reading from cin

When a program is reading from a disk file, the system "knows" when it gets to the end. This condition is called End-Of-File (EOF). All systems also provide some way of indicating an EOF when reading from the keyboard. This varies from system to system.

Dev-C++
Type: Enter Control-z Enter
MS Visual C++
Type: Enter Control-z Enter Enter
Reportedly there is a Microsoft patch that can be applied so that only one Enter is required after the Control-z. I wouldn't bother.
Other systems
Some may use other characters: control-D then Enter, or control-D followed by a control-Z, or ... .

You can just provide bad data to make cin fail in many cases. A student once claimed that typing "EOF" was the way to indicate and end-of-file from the console. Yes, it stops reading (because of an error) if you're reading numbers, but not when reading characters or strings!
Resetting after EOF

Altho it doesn't make sense to read after an EOF on a file, it is reasonable to read again from the console after an EOF has been entered. The clear function allows this.

while (cin >> x) {
... // loop reading until EOF (or bad input)
}

cin.clear(); // allows more reading
cin >> n;
...

Find the number of set bits in a given integer

Q: Find the number of set bits in a given integer

Sol: Parallel Counting: MIT HAKMEM Count

HAKMEM (Hacks Memo) is a legendary collection of neat mathematical and programming hacks contributed mostly by people at MIT and some elsewhere. This source is from the MIT AI LABS and this brilliant piece of code orginally in assembly was probably conceived in the late 70’s.

int BitCount(unsigned int u)

{
unsigned int uCount;

uCount = u
- ((u >> 1) & 033333333333)
- ((u >> 2) & 011111111111);
return
((uCount + (uCount >> 3))
& 030707070707) % 63;

}

Lets take a look at the theory behind this idea.

Take a 32bit number n; n = a31 * 231 + a30 * 230 +.....+ ak * 2k +....+ a1 * 2 + a0;

Here a0 through a31 are the values of bits (0 or 1) in a 32 bit number. Since the problem at hand is to count the number of set bits in the number, simply summing up these co-efficients would yeild the solution. (a0 + a1 +..+ a31 ).

How do we do this programmatically?

Take the original number n and store in the count variable. count=n;

Shift the orignal number 1 bit to the right and subtract from the orignal. count = n - (n >>1);

Now Shift the original number 2 bits to the right and subtract from count; count = n - (n>>1) - (n>>2);

Keep doing this until you reach the end. count = n - (n>>1) - (n>>2) - ... -( n>>31);

Let analyze and see what count holds now. n = a31 * 231 + a30 * 230 +.....+ ak * 2k +....+ a1 * 2 + a0; n >> 1 = a31 * 230 + a30 * 229 +.....+ ak * 2k-1 +....+ a1; n >> 2 = a31 * 229 + a30 * 228 +.....+ ak* 2k-2 +....+ a2

; .. n >> k = a31 * 2(31-k) + a30 * 2(30-k) +…..+ ak * 2k;

.. n>>31 = a31;

You can quickly see that: (Hint: 2k - 2k-1 = 2k-1; ) count = n - (n>>1) - (n>>2) - ... -( n>>31) =a31+ a30 +..+a0; which is what we are looking for;

int BitCount(unsigned int u)
{
unsigned int uCount=u;
do
{
u=u>>1;
uCount -= u;

}
while(u);
}

This certainaly is an interesting way to solve this problem. But how do you make this brilliant? Run this in constant time with constant memory!!.

int BitCount(unsigned int u)

{
unsigned int uCount;

uCount = u
- ((u >> 1) & 033333333333)
- ((u >> 2) & 011111111111);
return
((uCount + (uCount >> 3))
& 030707070707) % 63;

}

For those of you who are still wondering whats going? Basically use the same idea, but instead of looping over the entire number, sum up the number in blocks of 3 (octal) and count them in parallel.

After this statement uCount = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); uCount has the sum of bits in each octal block spread out through itself.

So if you can a block of 3 bits

u = a222 + a12+ a0; u>>1 = a2*2 + a1; u>>2 = a2;

u - (u>>1) - (u>>2) is a2+a1+a0 which is the sum of bits in each block of 3 bits.

The nexe step is to grab all these and sum them up:

((uCount + (uCount >> 3)) will re-arrange them in blocks of 6 bits and sum them up in such a way the every other block of 3 has the sum of number of set bits in the original number plus the preceding block of 3. The only expection here is the first block of 3. The idea is not to spill over the bits to adjacent blocks while summing them up. How is that made possible. Well, the maximum number of set bits in a block of 3 is 3, in a block of 6 is 6. and 3 bits can represent upto 7. This way you make sure you dont spill the bits over. To mask out the junk while doing a uCount>>3. Do and AND with 030707070707. THe only expection is the first block as I just mentioned.

What does ((uCount + (uCount >> 3)) & 030707070707) hold now? Its 2^0 * (2^6 - 1) * sum0 + 2^1 * (2^6 - 1) * sum1 + 2^2 * (2^6 - 1) * sum2 + 2^3 * (2^6 - 1) * sum3 + 2^4 * (2^6 - 1) * sum4 + 2^5 * (2^3 - 1) * sum5 where sum0 is the sum of number of set bits in every block of 6 bits starting from the ‘low’ position. What we need is sum0 + sum1 + sum2 + sum3 + sum4 + sum5; 2^6-1

Feb 9, 2008

堆和栈的区别

堆和栈的区别
solost 于 2004年 10月09日发表
一、预备知识—程序的内存分配
一个由c/C++编译的程序占用的内存分为以下几个部分
1、栈区（stack）— 由编译器(Compiler)自动分配释放，存放函数的参数值，局部变量的值等。其操作方式类似于数据结构中的栈。
2、堆区（heap） — 一般由程序员分配释放，若程序员不释放，程序结束时可能由OS回收。注意它与数据结构中的堆是两回事，分配方式倒是类似于链表，呵呵。
3、全局区（静态区）（static）—，全局变量和静态变量的存储是放在一块的，初始化的全局变量和静态变量在一块区域，未初始化的全局变量和未初始化的静态变量在相邻的另一块区域。 - 程序结束后有系统释放
4、文字常量区 — 常量字符串就是放在这里的。程序结束后由系统释放
5、程序代码区— 存放函数体的二进制代码。

二、例子程序
这是一个前辈写的，非常详细
//main.cpp
int a = 0; 全局初始化区
char *p1; 全局未初始化区
main()
{
int b; 栈
char s[] = "abc"; 栈
char *p2; 栈
char *p3 = "123456"; 123456\0在常量区，p3在栈上。
static int c =0；全局（静态）初始化区
p1 = (char *)malloc(10);
p2 = (char *)malloc(20);
分配得来得10和20字节的区域就在堆区。
strcpy(p1, "123456"); 123456\0放在常量区，编译器可能会将它与p3所指向的"123456"优化成一个地方。
}

二、堆和栈的理论知识
2.1申请方式
stack:
由系统自动分配。例如，声明在函数中一个局部变量 int b; 系统自动在栈中为b开辟空间
heap:
需要程序员自己申请，并指明大小，在c中malloc函数
如p1 = (char *)malloc(10);
在C++中用new运算符
如p2 = (char *)malloc(10);
但是注意p1、p2本身是在栈中的。

2.2 申请后系统的响应
栈：只要栈的剩余空间大于所申请空间，系统将为程序提供内存，否则将报异常提示栈溢出。
堆：首先应该知道操作系统有一个记录空闲内存地址的链表，当系统收到程序的申请时，
会遍历该链表，寻找第一个空间大于所申请空间的堆结点，然后将该结点从空闲结点链表中删除，并将该结点的空间分配给程序，另外，对于大多数系统，会在这块内存空间中的首地址处记录本次分配的大小，这样，代码中的delete语句才能正确的释放本内存空间。另外，由于找到的堆结点的大小不一定正好等于申请的大小，系统会自动的将多余的那部分重新放入空闲链表中。

2.3申请大小的限制
栈：在Windows下, 栈是向低地址扩展的数据结构，是一块连续的内存的区域。这句话的意思是栈顶的地址和栈的最大容量是系统预先规定好的，在WINDOWS下，栈的大小是2M（也有的说是1M，总之是一个编译时就确定的常数），如果申请的空间超过栈的剩余空间时，将提示overflow。因此，能从栈获得的空间较小。
堆：堆是向高地址扩展的数据结构，是不连续的内存区域。这是由于系统是用链表来存储的空闲内存地址的，自然是不连续的，而链表的遍历方向是由低地址向高地址。堆的大小受限于计算机系统中有效的虚拟内存。由此可见，堆获得的空间比较灵活，也比较大。

2.4申请效率的比较：
栈由系统自动分配，速度较快。但程序员是无法控制的。
堆是由new分配的内存，一般速度比较慢，而且容易产生内存碎片,不过用起来最方便.
另外，在WINDOWS下，最好的方式是用VirtualAlloc分配内存，他不是在堆，也不是在栈是直接在进程的地址空间中保留一快内存，虽然用起来最不方便。但是速度快，也最灵活。

2.5堆和栈中的存储内容
栈：在函数调用时，(1) 第一个进栈的是主函数中后的下一条指令（函数调用语句的下一条可执行语句）的地址，(2) 然后是函数的各个参数，在大多数的C编译器中，参数是由右往左入栈的，(3) 然后是函数中的局部变量。注意: 静态变量是不入栈的。
当本次函数调用结束后，(1) 局部变量先出栈，(2) 然后是参数，(3) 最后栈顶指针指向最开始存的地址，也就是主函数中的下一条指令，程序由该点继续运行。
堆：一般是在堆的头部用一个字节存放堆的大小。堆中的具体内容有程序员安排。

2.6存取效率的比较
char s1[] = "aaaaaaaaaaaaaaa";
char *s2 = "bbbbbbbbbbbbbbbbb";
aaaaaaaaaaa是在运行时刻赋值的；
而bbbbbbbbbbb是在编译时就确定的；
但是，在以后的存取中，在栈上的数组比指针所指向的字符串(例如堆)快。
比如：
#include
void main()
{
char a = 1;
char c[] = "1234567890";
char *p ="1234567890";
a = c[1];
a = p[1];
return;
}
对应的汇编代码
10: a = c[1];
00401067 8A 4D F1 mov cl,byte ptr [ebp-0Fh]
0040106A 88 4D FC mov byte ptr [ebp-4],cl
11: a = p[1];
0040106D 8B 55 EC mov edx,dword ptr [ebp-14h]
00401070 8A 42 01 mov al,byte ptr [edx+1]
00401073 88 45 FC mov byte ptr [ebp-4],al
第一种在读取时直接就把字符串中的元素读到寄存器cl中，而第二种则要先把指针值读到edx中，在根据edx读取字符，显然慢了。

2.7小结：
堆和栈的区别可以用如下的比喻来看出：
使用栈就象我们去饭馆里吃饭，只管点菜（发出申请）、付钱、和吃（使用），吃饱了就走，不必理会切菜、洗菜等准备工作和洗碗、刷锅等扫尾工作，他的好处是快捷，但是自由度小。
使用堆就象是自己动手做喜欢吃的菜肴，比较麻烦，但是比较符合自己的口味，而且自由度大。

Feb 5, 2008

Java and C++ 区别

AVA和C++都是面向对象语言。也就是说，它都能够实现面向对象思想（封装，继乘，多态）。而由于c++为了照顾大量的C语言使用者，

而兼容了C，使得自身仅仅成为了带类的C语言，多多少少影响了其面向对象的彻底性！JAVA则是完全的面向对象语言，它句法更清晰，规模更小，更易学。它是在对多种程序设计语言进行了深入细致研究的基础上，据弃了其他语言的不足之处，从根本上解决了c++的固有缺陷。

Java和c++的相似之处多于不同之处，但两种语言问几处主要的不同使得Java更容易学习，并且编程环境更为简单。

我在这里不能完全列出不同之处，仅列出比较显著的区别：

1．指针

JAVA 语言让编程者无法找到指针来直接访问内存无指针，并且增添了自动的内存管理功能，从而有效地防止了c／c++语言中指针操作失误，如野指针所造成的系统崩溃。但也不是说JAVA没有指针，虚拟机内部还是使用了指针，只是外人不得使用而已。这有利于Java程序的安全。

2．多重继承

c+ +支持多重继承，这是c++的一个特征，它允许多父类派生一个类。尽管多重继承功能很强，但使用复杂，而且会引起许多麻烦，编译程序实现它也很不容易。 Java不支持多重继承，但允许一个类继承多个接口(extends+implement)，实现了c++多重继承的功能，又避免了c++中的多重继承实现方式带来的诸多不便。

3．数据类型及类

Java是完全面向对象的语言，所有函数和变量部必须是类的一部分。除了基本数据类型之外，其余的都作为类对象，包括数组。对象将数据和方法结合起来，把它们封装在类中，这样每个对象都可实现自己的特点和行为。而c++允许将函数和变量定义为全局的。此外，Java中取消了c／c++中的结构和联合，消除了不必要的麻烦。

4．自动内存管理

Java程序中所有的对象都是用new操作符建立在内存堆栈上，这个操作符类似于c++的new操作符。下面的语句由一个建立了一个类Read的对象，然后调用该对象的work方法：

Read r＝new Read()； r.work()；

语句Read r＝new Read()；在堆栈结构上建立了一个Read的实例。Java自动进行无用内存回收操作，不需要程序员进行删除。而c十十中必须由程序贝释放内存资源，增加了程序设计者的负扔。Java中当一个对象不被再用到时，无用内存回收器将给它加上标签以示删除。JAVA里无用内存回收程序是以线程方式在后台运行的，利用空闲时间工作。

5．操作符重载

Java不支持操作符重载。操作符重载被认为是c十十的突出特征，在Java中虽然类大体上可以实现这样的功能，但操作符重载的方便性仍然丢失了不少。Java语言不支持操作符重载是为了保持Java语言尽可能简单。

6．预处理功能

Java不支持预处理功能。c／c十十在编译过程中都有一个预编泽阶段，即众所周知的预处理器。预处理器为开发人员提供了方便，但增加丁编译的复杂性。JAVA虚拟机没有预处理器，但它提供的引入语句(import)与c十十预处理器的功能类似。

7. Java不支持缺省函数参数，而c十十支持

在c中，代码组织在函数中，函数可以访问程序的全局变量。c十十增加了类，提供了类算法，该算法是与类相连的函数，c十十类方法与Java类方法十分相似，然而，由于c十十仍然支持c，所以不能阻止c十十开发人员使用函数，结果函数和方法混合使用使得程序比较混乱。

Java没有函数，作为一个比c十十更纯的面向对象的语言，Java强迫开发人员把所有例行程序包括在类中，事实上，用方法实现例行程序可激励开发人员更好地组织编码。

8 字符串

c和c十十不支持字符串变量，在c和c十十程序中使用Null终止符代表字符串的结束，在Java中字符串是用类对象(strinR和stringBuffer)来实现的，这些类对象是Java语言的核心，用类对象实现字符串有以下几个优点：

(1)在整个系统中建立字符串和访问字符串元素的方法是一致的；

(2)J3阳字符串类是作为Java语言的一部分定义的，而不是作为外加的延伸部分；

(3)Java字符串执行运行时检空，可帮助排除一些运行时发生的错误；

(4)可对字符串用“十”进行连接操作。

9“goto语句

“可怕”的goto语句是c和c++的“遗物”，它是该语言技术上的合法部分，引用goto语句引起了程序结构的混乱，不易理解，goto语句子要用于无条件转移子程序和多结构分支技术。鉴于以广理由，Java不提供goto语句，它虽然指定goto作为关键字，但不支持它的使用，使程序简洁易读。

l0．类型转换

在c和c十十中有时出现数据类型的隐含转换，这就涉及了自动强制类型转换问题。例如，在c十十中可将一浮点值赋予整型变量，并去掉其尾数。Java不支持c十十中的自动强制类型转换，如果需要，必须由程序显式进行强制类型转换。

11.异常

JAVA中的异常机制用于捕获例外事件，增强系统容错能力

try{／／可能产生例外的代码 }catch(exceptionType name){ //处理 }

其中exceptionType表示异常类型。而C++则没有如此方便的机制。

Jan 19, 2008

UML

http://soft.yesky.com/lesson/281/2472281.shtml

Jan 18, 2008

J2EE

因业务需要，“中科永联”正式更名为“中程在线”，欢迎大家浏览新网站“中程在线信息产业培训网”

中科永联高级技术培训中心（www.itisedu.com）

J2EE（Java 2 Enterprise Edition）是建立在Java 2平台上的企业级应用的解决方案。J2EE技术的基础便是Java 2平台，不但有J2SE平台的所有功能，同时还提供了对EJB，Servlet，JSP，XML等技术的全面支持，其最终目标是成为一个支持企业级应用开发的体系结构，简化企业解决方案的开发，部署和管理等复杂问题。事实上，J2EE已经成为企业级开发的工业标准和首选平台。

　　J2EE并非一个产品，而是一系列的标准。市场上可以看到很多实现了J2EE的产品，如BEA WebLogic，IBM WebSphere以及开源的JBoss等等。

J2EE，是sun公司提出的一个标准，符合这个标准的产品叫"实现"；其中你下载的sun公司的j2ee开发包中就有一个这样的"实现"，而 jboss，weblogic，websphere都是j2ee标准的一个"实现"。由于jboss，weblogic，websphere自身带有 j2ee的api，所以可以不使用sun的j2ee实现。

一. J2EE的概念

目前，Java 2平台有3个版本，它们是适用于小型设备和智能卡的Java 2平台Micro版（Java 2 Platform Micro Edition，J2ME）、适用于桌面系统的Java 2平台标准版（Java 2 Platform Standard Edition，J2SE）、适用于创建服务器应用程序和服务的Java2平台企业版（Java 2 Platform Enterprise Edition，J2EE）。

J2EE是一种利用Java 2平台来简化企业解决方案的开发、部署和管理相关的复杂问题的体系结构。J2EE技术的基础就是核心Java平台或Java 2平台的标准版，J2EE不仅巩固了标准版中的许多优点，例如"编写一次、随处运行"的特性、方便存取数据库的JDBC API、CORBA技术以及能够在Internet应用中保护数据的安全模式等等，同时还提供了对 EJB（Enterprise JavaBeans）、Java Servlets API、JSP（Java Server Pages）以及XML技术的全面支持。其最终目的就是成为一个能够使企业开发者大幅缩短投放市场时间的体系结构。

J2EE体系结构提供中间层集成框架用来满足无需太多费用而又需要高可用性、高可靠性以及可扩展性的应用的需求。通过提供统一的开发平台，J2EE降低了开发多层应用的费用和复杂性，同时提供对现有应用程序集成强有力支持，完全支持Enterprise JavaBeans，有良好的向导支持打包和部署应用，添加目录支持，增强了安全机制，提高了性能。

二. J2EE的优势

J2EE为搭建具有可伸缩性、灵活性、易维护性的商务系统提供了良好的机制:
保留现存的IT资产: 由于企业必须适应新的商业需求，利用已有的企业信息系统方面的投资，而不是重新制定全盘方案就变得很重要。这样，一个以渐进的（而不是激进的，全盘否定的）方式建立在已有系统之上的服务器端平台机制是公司所需求的。J2EE架构可以充分利用用户原有的投资，如一些公司使用的BEA Tuxedo、IBM CICS, IBM Encina,、Inprise VisiBroker 以及Netscape Application Server。这之所以成为可能是因为J2EE拥有广泛的业界支持和一些重要的'企业计算'领域供应商的参与。每一个供应商都对现有的客户提供了不用废弃已有投资，进入可移植的J2EE领域的升级途径。由于基于J2EE平台的产品几乎能够在任何操作系统和硬件配置上运行，现有的操作系统和硬件也能被保留使用。

高效的开发: J2EE允许公司把一些通用的、很繁琐的服务端任务交给中间件供应商去完成。这样开发人员可以集中精力在如何创建商业逻辑上，相应地缩短了开发时间。高级中间件供应商提供以下这些复杂的中间件服务:

状态管理服务 -- 让开发人员写更少的代码，不用关心如何管理状态，这样能够更快地完成程序开发。
持续性服务 -- 让开发人员不用对数据访问逻辑进行编码就能编写应用程序，能生成更轻巧，与数据库无关的应用程序，这种应用程序更易于开发与维护。
分布式共享数据对象CACHE服务 -- 让开发人员编制高性能的系统，极大提高整体部署的伸缩性。
支持异构环境: J2EE能够开发部署在异构环境中的可移植程序。基于J2EE的应用程序不依赖任何特定操作系统、中间件、硬件。因此设计合理的基于J2EE的程序只需开发一次就可部署到各种平台。这在典型的异构企业计算环境中是十分关键的。J2EE标准也允许客户订购与J2EE兼容的第三方的现成的组件，把他们部署到异构环境中，节省了由自己制订整个方案所需的费用。
可伸缩性: 企业必须要选择一种服务器端平台，这种平台应能提供极佳的可伸缩性去满足那些在他们系统上进行商业运作的大批新客户。基于J2EE平台的应用程序可被部署到各种操作系统上。例如可被部署到高端UNIX与大型机系统，这种系统单机可支持64至256个处理器。（这是NT服务器所望尘莫及的）J2EE领域的供应商提供了更为广泛的负载平衡策略。能消除系统中的瓶颈，允许多台服务器集成部署。这种部署可达数千个处理器，实现可高度伸缩的系统，满足未来商业应用的需要。
稳定的可用性: 一个服务器端平台必须能全天候运转以满足公司客户、合作伙伴的需要。因为INTERNET是全球化的、无处不在的，即使在夜间按计划停机也可能造成严重损失。若是意外停机，那会有灾难性后果。J2EE部署到可靠的操作环境中，他们支持长期的可用性。一些J2EE部署在WINDOWS环境中，客户也可选择健壮性能更好的操作系统如Sun Solaris、IBM OS/390。最健壮的操作系统可达到99.999%的可用性或每年只需5分钟停机时间。这是实时性很强商业系统理想的选择。

三. J2EE 的四层模型

J2EE使用多层的分布式应用模型，应用逻辑按功能划分为组件，各个应用组件根据他们所在的层分布在不同的机器上。事实上，sun设计J2EE的初衷正是为了解决两层模式(client/server)的弊端，在传统模式中，客户端担当了过多的角色而显得臃肿，在这种模式中，第一次部署的时候比较容易，但难于升级或改进，可伸展性也不理想，而且经常基于某种专有的协议――通常是某种数据库协议。它使得重用业务逻辑和界面逻辑非常困难。现在J2EE 的多层企业级应用模型将两层化模型中的不同层面切分成许多层。一个多层化应用能够为不同的每种服务提供一个独立的层，以下是 J2EE 典型的四层结构:

运行在客户端机器上的客户层组件
运行在J2EE服务器上的Web层组件
运行在J2EE服务器上的业务逻辑层组件
运行在EIS服务器上的企业信息系统(Enterprise information system)层软件

J2EE应用程序组件
J2EE应用程序是由组件构成的.J2EE组件是具有独立功能的软件单元，它们通过相关的类和文件组装成J2EE应用程序，并与其他组件交互。J2EE说明书中定义了以下的J2EE组件:
应用客户端程序和applets是客户层组件.
Java Servlet和JavaServer Pages(JSP)是web层组件.
Enterprise JavaBeans(EJB)是业务层组件.

客户层组件
J2EE应用程序可以是基于web方式的,也可以是基于传统方式的.
web 层组件J2EE web层组件可以是JSP 页面或Servlets.按照J2EE规范，静态的HTML页面和Applets不算是web层组件。

正如下图所示的客户层那样，web层可能包含某些 JavaBean 对象来处理用户输入，并把
输入发送给运行在业务层上的enterprise bean 来进行处理。

业务层组件
业务层代码的逻辑用来满足银行，零售，金融等特殊商务领域的需要,由运行在业务层上的enterprise bean 进行处理. 下图表明了一个enterprise bean 是如何从客户端程序接收数据，进行处理(如果必要的话), 并发送到EIS 层储存的，这个过程也可以逆向进行。

有三种企业级的bean: 会话(session) beans, 实体(entity) beans, 和消息驱动(message-driven) beans. 会话bean 表示与客户端程序的临时交互. 当客户端程序执行完后, 会话bean 和相关数据就会消失. 相反, 实体bean 表示数据库的表中一行永久的记录. 当客户端程序中止或服务器关闭时, 就会有潜在的服务保证实体bean 的数据得以保存.消息驱动 bean 结合了会话bean 和 JMS的消息监听器的特性, 允许一个业务层组件异步接收JMS 消息.

企业信息系统层
企业信息系统层处理企业信息系统软件包括企业基础建设系统例如企业资源计划 (ERP), 大型机事务处理, 数据库系统,和其它的遗留信息系统. 例如，J2EE 应用组件可能为了数据库连接需要访问企业信息系统

我们就J2EE的各种组件、服务和API，进行更加详细的阐述，看看在开发不同类型的企业级应用时，根据各自需求和目标的不同，应当如何灵活使用并组合不同的组件和服务。

· Servlet

Servlet是Java平台上的CGI技术。Servlet在服务器端运行，动态地生成Web页面。与传统的CGI和许多其它类似CGI的技术相比， Java Servlet具有更高的效率并更容易使用。对于Servlet，重复的请求不会导致同一程序的多次转载，它是依靠线程的方式来支持并发访问的。

· JSP

JSP(Java Server Page)是一种实现普通静态HTML和动态页面输出混合编码的技术。从这一点来看，非常类似Microsoft ASP、PHP等技术。借助形式上的内容和外观表现的分离，Web页面制作的任务可以比较方便地划分给页面设计人员和程序员，并方便地通过JSP来合成。在运行时态， JSP将会被首先转换成Servlet，并以Servlet的形态编译运行，因此它的效率和功能与Servlet相比没有差别，一样具有很高的效率。

· EJB

EJB定义了一组可重用的组件：Enterprise Beans。开发人员可以利用这些组件，像搭积木一样建立分布式应用。在装配组件时，所有的Enterprise Beans都需要配置到EJB服务器(一般的Weblogic、WebSphere等J2EE应用服务器都是EJB服务器)中。EJB服务器作为容器和低层平台的桥梁管理着EJB容器，并向该容器提供访问系统服务的能力。所有的EJB实例都运行在EJB容器中。EJB容器提供了系统级的服务，控制了EJB的生命周期。EJB容器为它的开发人员代管了诸如安全性、远程连接、生命周期管理及事务管理等技术环节，简化了商业逻辑的开发。EJB中定义了三种Enterprise Beans：

◆ Session Beans

◆ Entity Beans

◆ Message-driven Beans

· JDBC

JDBC(Java Database Connectivity，Java数据库连接)API是一个标准SQL(Structured Query Language，结构化查询语言)数据库访问接口，它使数据库开发人员能够用标准Java API编写数据库应用程序。JDBC API主要用来连接数据库和直接调用SQL命令执行各种SQL语句。利用JDBC API可以执行一般的SQL语句、动态SQL语句及带IN和OUT参数的存储过程。Java中的JDBC相当与Microsoft平台中的ODBC(Open Database Connectivity)。

· JMS

JMS(Java Message Service，Java消息服务) 是一组Java应用接口，它提供创建、发送、接收、读取消息的服务。JMS API定义了一组公共的应用程序接口和相应语法，使得Java应用能够和各种消息中间件进行通信，这些消息中间件包括IBM MQ-Series、Microsoft MSMQ及纯Java的SonicMQ。通过使用JMS API，开发人员无需掌握不同消息产品的使用方法，也可以使用统一的JMS API来操纵各种消息中间件。通过使用JMS，能够最大限度地提升消息应用的可移植性。 JMS既支持点对点的消息通信，也支持发布/订阅式的消息通信。

· JNDI

由于J2EE应用程序组件一般分布在不同的机器上，所以需要一种机制以便于组件客户使用者查找和引用组件及资源。在J2EE体系中，使用JNDI (Java Naming and Directory Interface)定位各种对象，这些对象包括EJB、数据库驱动、JDBC数据源及消息连接等。JNDI API为应用程序提供了一个统一的接口来完成标准的目录操作，如通过对象属性来查找和定位该对象。由于JNDI是独立于目录协议的，应用还可以使用 JNDI访问各种特定的目录服务，如LDAP、NDS和DNS等。

· JTA

JTA(Java Transaction API)提供了J2EE中处理事务的标准接口，它支持事务的开始、回滚和提交。同时在一般的J2EE平台上，总提供一个JTS(Java Transaction Service)作为标准的事务处理服务，开发人员可以使用JTA来使用JTS。

· JCA

JCA(J2EE Connector Architecture)是J2EE体系架构的一部分，为开发人员提供了一套连接各种企业信息系统(EIS，包括ERP、SCM、CRM等)的体系架构，对于EIS开发商而言，它们只需要开发一套基于JCA的EIS连接适配器，开发人员就能够在任何的J2EE应用服务器中连接并使用它。基于JCA的连接适配器的实现，需要涉及J2EE中的事务管理、安全管理及连接管理等服务组件。

· JMX

JMX(Java Management Extensions)的前身是JMAPI。JMX致力于解决分布式系统管理的问题。JMX是一种应用编程接口、可扩展对象和方法的集合体，可以跨越各种异构操作系统平台、系统体系结构和网络传输协议，开发无缝集成的面向系统、网络和服务的管理应用。JMX是一个完整的网络管理应用程序开发环境，它同时提供了厂商需要收集的完整的特性清单、可生成资源清单表格、图形化的用户接口；访问SNMP的网络API；主机间远程过程调用；数据库访问方法等。

· JAAS

JAAS(Java Authentication and Authorization Service)实现了一个Java版本的标准Pluggable Authentication Module(PAM)的框架。JAAS可用来进行用户身份的鉴定，从而能够可靠并安全地确定谁在执行Java代码。同时JAAS还能通过对用户进行授权，实现基于用户的访问控制。

· JACC

JACC(Java Authorization Service Provider Contract for Containers)在J2EE应用服务器和特定的授权认证服务器之间定义了一个连接的协约，以便将各种授权认证服务器插入到J2EE产品中去。

· JAX-RPC

通过使用JAX-RPC(Java API for XML-based RPC)，已有的Java类或Java应用都能够被重新包装，并以Web Services的形式发布。JAX-RPC提供了将RPC参数(in/out)编码和解码的API，使开发人员可以方便地使用SOAP消息来完成RPC 调用。同样，对于那些使用EJB(Enterprise JavaBeans)的商业应用而言，同样可以使用JAX-RPC来包装成Web服务，而这个Web Servoce的WSDL界面是与原先的EJB的方法是对应一致的。JAX-RPC为用户包装了Web服务的部署和实现，对Web服务的开发人员而言， SOAP/WSDL变得透明，这有利于加速Web服务的开发周期。

· JAXR

JAXR（Java API for XML Registries）提供了与多种类型注册服务进行交互的API。JAXR运行客户端访问与JAXR规范相兼容的Web Servcices，这里的Web Services即为注册服务。一般来说，注册服务总是以Web Services的形式运行的。JAXR支持三种注册服务类型：JAXR Pluggable Provider、Registry-specific JAXR Provider、JAXR Bridge Provider(支持UDDI Registry和ebXML Registry/Repository等)。

· SAAJ

SAAJ(SOAP with Attachemnts API for Java)是JAX-RPC的一个增强，为进行低层次的SOAP消息操纵提供了支持。

四. J2EE 的结构

这种基于组件，具有平台无关性的J2EE 结构使得J2EE 程序的编写十分简单，因为业务逻辑被封装成可复用的组件，并且J2EE 服务器以容器的形式为所有的组件类型提供后台服务. 因为你不用自己开发这种服务, 所以你可以集中精力解决手头的业务问题.

容器和服务

容器设置定制了J2EE服务器所提供得内在支持，包括安全，事务管理，JNDI(Java Naming and Directory Interface)寻址,远程连接等服务，以下列出最重要的几种服务：

J2EE安全(Security)模型可以让你配置 web 组件或enterprise bean ,这样只有被授权的用户才能访问系统资源. 每一客户属于一个特别的角色，而每个角色只允许激活特定的方法。你应在enterprise bean的布置描述中声明角色和可被激活的方法。由于这种声明性的方法，你不必编写加强安全性的规则。

J2EE 事务管理（Transaction Management）模型让你指定组成一个事务中所有方法间的关系，这样一个事务中的所有方法被当成一个单一的单元. 当客户端激活一个enterprise bean中的方法，容器介入一管理事务。因有容器管理事务，在enterprise bean中不必对事务的边界进行编码。要求控制分布式事务的代码会非常复杂。你只需在布置描述文件中声明enterprise bean的事务属性，而不用编写并调试复杂的代码。容器将读此文件并为你处理此enterprise bean的事务。

JNDI 寻址(JNDI Lookup)服务向企业内的多重名字和目录服务提供了一个统一的接口,这样应用程序组件可以访问名字和目录服务.

J2EE远程连接（Remote Client Connectivity）模型管理客户端和enterprise bean间的低层交互. 当一个enterprise bean创建后, 一个客户端可以调用它的方法就象它和客户端位于同一虚拟机上一样.

生存周期管理（Life Cycle Management）模型管理enterprise bean的创建和移除,一个enterprise bean在其生存周期中将会历经几种状态。容器创建enterprise bean，并在可用实例池与活动状态中移动他，而最终将其从容器中移除。即使可以调用enterprisebean的create及remove方法，容器也将会在后台执行这些任务。

五、企业级应用示例

下面我们通过假设一个企业应用的J2EE实现，来了解各种组件和服务的应用。假设应用对象是计算机产品的生产商/零售商的销售系统，这个销售系统能够通过自己的网站发布产品信息，同时也能将产品目录传送给计算机产品交易市场。销售系统能够在线接受订单(来自自己的Web网站或者来自计算机产品交易市场)，并随后转入内部企业管理系统进行相关的后续处理。

参见图1，这个企业应用可以这种方式架构。该企业应用的核心是产品目录管理和产品定购管理这两个业务逻辑，使用EJB加以实现，并部署在EJB容器中。由于产品目录和定购信息都需要持久化，因此使用JDBC连接数据库，并使用JTA来完成数据库存取事务。

图1 J2EE应用示例

然后使用JSP/Servlet来实现应用的Web表现：在线产品目录浏览和在线定购。为了将产品目录发送给特定的交易市场，使用JMS实现异步的基于消息的产品目录传输。为了使得更多的其它外部交易市场能够集成产品目录和定购业务，需要使用Web Services技术包装商业逻辑的实现。由于产品定购管理需要由公司内部雇员进行处理，因此需要集成公司内部的用户系统和访问控制服务以方便雇员的使用，使用JACC集成内部的访问控制服务，使用JNDI集成内部的用户目录，并使用JAAS进行访问控制。由于产品订购事务会触发后续的企业ERP系统的相关操作(包括仓储、财务、生产等)，需要使用JCA连接企业ERP。

最后为了将这个应用纳入到企业整体的系统管理体系中去，使用Application Client架构了一个管理客户端(与其它企业应用管理应用部署在一台机器上)，并通过JMX管理这个企业应用。