Imagini ale paginilor
PDF
ePub

Mr. ROBERTS. He worked for INS. I don't think his primary responsibilities were monitoring.

Senator SIMPSON. It was one of them, though, was it not?
Mr. ROBERTS. It could have been, yes.

Senator SIMPSON. Thank you.

Senator Kennedy.

Senator KENNEDY. As I understand, historically about 80 percent of the testing has gone on in INS and about 20 outside. Is that your understanding?

Mr. ROBERTS. Yes.

Senator KENNEDY. It has been going on for some period of time, and I also understand that it is being reviewed within INS about whether it ought to be enhanced or reduced. We didn't get into that this morning and some time down the road maybe we will, but at least that is being reviewed. But the idea is that this contracting out has been going on over some period of time, as I understand it. I don't know whether the more recent problems that you commented on-whether those have been similar kinds of problems in the past, but they certainly have to be addressed and dealt with.

I think it is important for the Americans who are watching this to understand that even if they go on through, you have these very unfortunate incidents where you have trimming or cheating on these various programs. Even after they go through the test, the individuals still have to go through an INS system and clearance, isn't that right?

Mr. ROBERTS. Yes, sir.

Senator KENNEDY. They have to go through the interviews and other provisions. It isn't just the fact that you give them a stamp of approval, or any of these alleged sites give them the stamp of approval and they automatically get their citizenship.

Mr. KRIEGER. No; Senator, they must file an N-400 form, fingerprint form, pictures, and supportive documents with the INS. If they take the test with us and pass it, and it is the English examination, they still must be interviewed by an examiner to show that they can speak and understand ordinary English.

If I can just make one correction, sir, we are not contractors. We are authorized by the INS. We receive no funds from INS whatsoever to do this. The only funds that we receive and we divide with our sites are from the fees paid by the aliens to take the test or to take any other services.

Senator KENNEDY. Well, I think these issues have been raised and they are enormously troublesome. You are going to have an opportunity to address them and we will obviously give you a chance for that kind of response. You are certainly entitled to it, but obviously it creates an enormously adverse feeling among Americans; I am sure by those individuals who want to become citizens, too. I mean, most of them are out there playing by the rules, trying to do their work, trying to study, trying to qualify on this, and they find others are gimmicking the game on this thing. It is wrong from every point of view.

So we will have a chance to review this in greater detail down the road, but I appreciate your being here today.

Senator SIMPSON. Thank you very much, both of you. Thank you very much, Mr. Roberts and Mr. Krieger.

We will go now to the third panel, Michael J. Feuer, Director of the Board of Testing and Assessment of the National Academy of Sciences, and Bert Green, professor of psychology at Johns Hopkins University. We welcome you both to the panel and will appreciate very much having your views in the order on the witness list. Mr. Feuer.

PANEL CONSISTING OF MICHAEL J. FEUER, DIRECTOR, BOARD ON TESTING AND ASSESSMENT, NATIONAL ACADEMY OF SCIENCES, WASHINGTON, DC; AND BERT F. GREEN, JR., PROFESSOR OF PSYCHOLOGY, JOHNS HOPKINS UNIVERSITY, BALTIMORE, MD

STATEMENT OF MICHAEL J. FEUER

Mr. FEUER. Good morning, Mr. Chairman and members of the subcommittee. It is a pleasure to be here to testify on issues related to testing of immigrants seeking naturalization.

I am the Director of the Board on Testing and Assessment of the National Academy of Sciences and an adjunct professor of public policy at Georgetown University, but I testify today as a private citizen with some expertise on testing and my remarks do not represent the opinions or positions of either the academy or Georgetown University.

Section 312 of the Immigration Act codifies in law a notion that is, at least on its surface, logical and understandable. New Americans should be able to demonstrate an understanding of the English language, including an ability to read, write and speak words in ordinary usage. Who would argue, indeed, that any American, naturalized or not, doesn't need a basic command of English?

But in making this a condition for naturalization, the act legitimates the use of "reasonable tests of literacy," as screening devices, and provides legal backing to the principle that performance on a test can open or close the gate to naturalization. Given the significance of the consequences associated with passing or failing such a test, the granting or denying of American citizenshipsurely, one of the most cherished credentials on Earth-it is particularly important to assure that the tests, if they are used for this purpose, meet the highest professional and ethical standards in their design, validation, administration, and interpretation.

I am going to summarize my testimony briefly. I have the written version to be submitted for the record and I would like to just focus on three main points. First, some general principles of sound testing practices; second, some specific problems associated with literacy testing; and, third, some caution flags about section 312 in the light of these considerations.

On the general question of testing, to try to summarize that in a minute would undoubtedly blaspheme 100 years' worth of very important scientific and technological research in the field, but I will give it a shot. Testing is a tool of information. It provides estimates of human performance. Good tests are developed in a lengthy, time-consuming, technologically sophisticated and costly process, from design to item development to pilot testing, validation, administration, scoring, standard-setting, and ultimately the drawing of fair and valid inferences based on test results.

Even under the best of circumstances, though, test results come with some error. Some people who pass should have failed, some who fail should have passed. In light of the fact that tests can produce estimates with that kind of error, it is all the more important that the purpose of the testing be well-defined and well understood and the tests not be used probably ever as the sole basis for making important decisions about individuals or groups or institutions.

The second point is literacy testing, it raises some particularly complicated questions. Intuitively, we may all feel that knowing basic English usage is a worthy objective. Translating that concept into criteria that would inform the development of tests and test items and inferences from those tests is another matter. We have many different kinds of literacy tests in use currently for many different purposes. Some are oriented toward early diagnosis of reading difficulties in young children and can be useful in developing instructional strategies. Others are meant to provide broader information on the literacy of the whole population or of groups within the population, such as the National Adult Literacy Survey and other such instruments.

Others yet define literacy with respect to specific academic or work-related competencies, such as verbal skill required for liberal arts studies, let us say, or quantitative skill required, for example, to go into accounting. Many colleges and universities use established tests of English that are designed for students whose native language is not English, and they use as one of the many criteria to determine qualification for admission; emphasize the word "many."

In a word, then, the definition of literacy is complicated. It is often controversial, and the assessment becomes even more complicated and complex because of the need to determine the specific purposes, the criteria, the sample population issues, standards, the meaningfulness of scores, the enforcement of sound and ethical principles of test use.

About section 312, it certainly does open the door for what people in the testing world would refer to as very high-stakes testing, and there are several important and related values that I think need to be safeguarded in the application of this statute. Let me suggest three directions, and this can be the basis perhaps for some further discussion later.

Point No. 1 is that we need to define the purpose very clearly. Is the testing indeed intended to screen out and deny citizenship to individuals who meet other criteria, but fail to demonstrate basic English language proficiency? If so, then the test is indeed intended for use in very high-stakes decisions, and the procedures for content specification, pilot testing, validation, scoring, and cut-score determination need to meet the most exacting standards.

It would be ironic indeed if procedures used to admit newcomers to the American way of life, which stands for fairness and justice, were themselves unfair or unjust. It was mentioned earlier that perhaps we need some kind of national, broad dialog, a consensus process about exactly what kinds of standards of proficiency we have in mind.

Two other points. I see the red light is on, so I will cut them down to one sentence each. Conduct extensive experimentation with various tests to make sure that they actually serve the purposes for which they are intended. And point No. 3, make sure that we regulate this entire process effectively so that the integrity of naturalization, as well as the integrity of testing, can be safeguarded.

Senator SIMPSON. Thank you very much.

[The prepared statement of Mr. Feuer follows:]

PREPARED STATEMENT OF MICHAEL T. FEUER, PH.D.

Good morning Mr. Chairman and members of the subcommittee. My name is Michael Feuer. I am pleased to have this opportunity to testify on issues related to testing of immigrants seeking naturalization as American citizens. I am currently the Director of the Board on Testing and Assessment of the National Academy of Sciences, and adjunct professor of Public Policy at Georgetown University. I have been involved in research and analysis on matters of testing and public policy for over a decade. Prior to joining the Academy I was a senior analyst and project director at the Office of Technology Assessment, where I specialized in education, psychological testing, and human resources policy.

My testimony today does not necessarily represent the opinions or positions of the Academy, the National Research Council, or the Board.

Section 312 of the Immigration Act codifies in law a notion that is, on its surface, logical and understandable: new Americans should be able to demonstrate "an understanding of the English language, including an ability to read, write, and speak words in ordinary usage Indeed, who would argue that any American-naturalized or not-does not need a basic command of English? In making this a condition for naturalization, however, the Act legitimates the use of "reasonable test[s] of *** literacy" as screening devices, and provides legal backing to the principle that performance on a test can open or close the gate to naturalization. Given the significance of the consequences associated with passing or failing such tests-the granting or denying of American citizenship, surely one of the most cherished credentials on earth-it is particularly important to assure that the tests meet the highest professional and ethical standards in their design, validation, administration, and interpretation.

In my testimony I will focus on the following issues:

General principles for sound testing practice;

Specific issues related to tests of English literacy; and

Implications for the uses of tests in naturalization proceedings.

I. PRINCIPLES OF GOOD TEST PRACTICE: AN ABRIDGED GUIDE

Tests are tools of information. They provide estimates of knowledge or behavior based on samples, and scores are therefore always subject to error.

The first step in the process of making a test, therefore, is to determine the purposes to which the test information will be put. Who will use the data? What decisions will rest on the scores? This is a seemingly simple and straightforward principle that definition of purpose should precede and inform design and applicationbut it is often overlooked by test makers, test users, and policy makers who rely on test data. In the US we do a great deal of testing, for many different purposes. For example, we use tests to:

Diagnose various types of learning processes in students, and give teachers, parents, and learners useful information to guide practice;

Check annual progress (achievement) of individual students or groups of students in mastering subject matter;

Evaluate the condition of education in the nation as a whole, using such instruments as the National Assessment of Educational Progress;

Provide information to employers about the expected performance capabilities of job applicants;

Select new recruits into various training and specialty programs in the armed services;

Provide information to college admissions officers about the expected academic performance of student applicants;

Determine eligibility for special programs in education and training institutions (e.g., special education, gifted and talented, firm-sponsored training, etc.); and

Assess the literacy-verbal, numeric, scientific, historical-of the population at large or of specific groups within the population.

The importance of understanding the purpose of testing as a prerequisite to design and administration cannot be overemphasized.

The next step is to develop items short answer questions, essays, performance tasks that provide a reasonable measure of the broader domain of interest. For example, if we are interested in testing an individual's mastery of the high school science curriculum in a particular school district, we might design a test that includes different kinds of items (short answer questions to establish basic factual knowledge, longer and performance-based items to establish scientific reasoning processes), which together give a reliable measure of the test-taker's competence in the defined subject area. Defining the subject area and linking test items to content are ingredients in what testing professionals call "validity," i.e., the extent to which a test measures what it is supposed to measure. Obviously, a test that is intended to measure the skills necessary to be a good science writer would be quite different from a test intended to measure an individual's skill in performing chemical experiments in the laboratory-even though they both, ostensibly, would measure something about science.

Writing test items that appear to cover the intended domain is just the beginning of a crucial and often time-consuming process of “test validation,” which refers to the gathering of empirical evidence linking test performance to the broader domain that the test is intended to estimate. Returning to the science example, this stage in the design process involves careful scrutiny of test items by content experts, so that, for example, questions about the molecular structure of cells are drawn fromand provide an indicator of-accepted scientific knowledge; administering the test items to samples of students to explore empirically the ways in which items are interpreted by potential test-takers; checking the relationship of their scores to other independent sources of data about their science competency; comparing the accuracy of different types of items; and, overall, developing a sound basis for using the test beyond the experimental sample and as a basis for inferences about students' mastery of the defined material. These are not simple or trivial exercises, and they require a delicate blending of content expertise, psychological theory, statistical analysis, and editorial finesse.

The next question becomes how to interpret the scores. Suppose we have a science test with items that are well-suited to estimating mastery of a broader curriculum. What does a score of 75 mean? Is 80 good enough? Obviously, a score is meaningless unless it is understood to relate to something: if my body temperature is 103, that is meaningful only because I know that average human body temperature is around 98. In educational and psychological testing, too, scores cannot be understood in a vacuum. Test developers must therefore develop norms, or standards against which to understand scores obtained by individual test takers. Some norms are established by giving a test to a sample population that is representative of the population that will later be given the real test; then, individuals who are tested can interpret their scores in relation to the distribution of scores obtained in the representative population. Although these so-called "norm-referenced tests" are sometimes criticized as reinforcing competitive ranking and sorting, they are still widely used in many contexts; perhaps the most familiar is the nationally-normed educational achievement test, results of which are presented to enable individual students to judge their performance relative to a large nationally-representative sample of students in their age and grade group. Alternatively, a test can be normed against a defined body of knowledge, so that the score is not just a ranking of individuals compared to one another but an estimate of one's knowledge relative to the ideal or total knowledge one might theoretically have in a given subject. For such a test, a score of 75 suggests something about how much of the curriculum the test taker has mastered.

Without belaboring the details, suffice to say that norming is, itself, an essential part of the test design process, and an undertaking of substantial statistical complexity that is both time-consuming and costly when done properly. Based on norming studies and the precise definition of how test scores are indicators of future performance, an important next step for tests used in screening, selection, and some system-monitoring contexts is the determination of cutoff scores. This step essentially involves asking the question: how good is good enough? Standard-setting and the application of cutoff scores is, itself, a major subtopic in test theory and practice. It is crucial to remember that test scores are estimates, and that no matter where one puts the cutoff, some individuals who should have passed will fail and others who should have failed will pass.

Assuming that test content and norms have been obtained, the next step is to assure proper administration. To assure fairness and test validity, it is obviously im

« ÎnapoiContinuă »