You might call this the war room of Edison Elementary. Third grade teachers huddle around the table, flipping through binders of student statistics and old tests marked in red pen. The walls are papered with a dizzying array of numbers from different tests, highlighted by grade level. The teachers sip water, snack on bagels and try to figure it out: How many kids will it take to beat No Child Left Behind?

The federal government demands that more and more kids at Edison score “proficient” on state tests every year. It sounds good to politicians and the press, but the teachers know it isn’t that simple.

The problem: The tests don’t track how much each child improves. Instead, they measure how each group of children scores compared to the last group. So Edison is actually trying to get a whole new set of third graders to do better than the last crop of third graders.

The system can handicap a school like Edison in City Heights, where almost all students are poor and most are learning English. If Edison is unlucky and gets a group of third graders who come in woefully behind, the school could get dinged with lower scores — even if kids improve dramatically during the year.

Murmuring over their worksheets, the teachers piece together the data: One teacher discovers that 84 percent of her students scored well on state tests last year. Another finds only 47 percent of her students did well. Tina Zubrod smiles determinedly at even lower numbers.

“I know my reality,” the special education teacher says of her students’ scores.

Principal Tavga Bustani knows that reality, too. She wants teachers to set realistic goals — any improvement for all kids, no matter where they start from. So the teachers pore over the data and calculate goals for kids’ growth in each unique classroom.

But the feds pressure schools to push children to proficiency, not just to improve. Under No Child Left Behind, kids who advance dramatically but still fall below that bar, or are already high above it, just don’t matter.

So the teachers also figure out how many kids — and which ones — need to be proficient to meet the federal goals. Each teacher chooses five or six students who could make it to proficiency with a little extra nudge. Edison calls them the “Focus Kids.”

The choices feel like educational triage. Bustani is emphatic that her teachers still help every student. But it is the smiling faces of the Focus Kids in that big book on her desk. And under No Child Left Behind, their success will define whether Edison is labeled good or bad, improving or not.

♦♦♦

This is the system that Edison and other schools have been forced to live with: Being judged not by how much they help kids improve, but by how many kids score above a certain bar.

That system is also deeply unfair. Schools in wealthy areas almost always do better on standardized tests because their students come to school better prepped than poorer students. So a school that does little to help children who already excel can easily get better scores than a school that helps struggling students improve dramatically.

There is another way to measure achievement, however, and it could revolutionize how schools like Edison are evaluated.

The idea is simple and potent: Measure how much each child improves over time, instead of simply how high they score.

One camp of education reformers wants to take the idea even further. They want to interpret test scores to rate teachers and schools, determining how they impact students’ scores. Backers argue that the idea, known as value added, is a powerful way to examine what educators do because it accounts for where kids started from, instead of simply seeing who scores high.

It is deeply controversial. Statisticians debate whether it is technically possible to tease out how a teacher impacts a student from a multitude of other factors. Teachers unions say tests are a shoddy way to measure teaching in the first place.

The idea has some powerful backers, though. As President Obama has pushed schools to use the method to evaluate teachers and decide their pay, the debate that used to be limited to academics and education wonks has exploded into headlines.

The Los Angeles Times used value-added data this summer in an explosive series, showing that seemingly stellar schools were actually letting kids slip behind, while lower-ranked schools pushed them ahead. California doesn’t measure what schools do, the paper wrote, it measures the advantages their students have.

The Times also published ratings for individual teachers based on test scores, igniting a larger debate about whether the sensitive information should be used for teacher evaluation — or be public at all. The teachers union decried the ratings as an attack on their profession. Proponents called it a public service to reveal data ignored by the school district, saying it could empower parents.

The San Diego Unified school board, which is strongly backed by the teachers union, has panned the idea of rating teachers with test scores, saying it reduces teaching to test prep. The district doesn’t calculate scores for teachers. It turned away from Race to the Top, an Obama administration competition for federal stimulus money that emphasized tying test score growth to teacher evaluation.

“If you focus exclusively on test scores, your kids are going to be taught to be test takers,” said Craig Leedham, executive director of the San Diego teachers union. “Tests are overemphasized because they’re the easiest thing to measure.”

Yet San Diego Unified has embraced the idea of using similar information to help schools improve. It wants to measure schools by how much they help each child grow. Some schools are already crunching test scores to measure growth. They don’t use the calculations to rate teachers; they use them to study what gets good results.

School board member John de Beck said anyone who cares about kids would want to look at how students are improving, not just how they compare to other children. Yet he is fiercely opposed to using that information to score teachers, calling it simplistic. De Beck is walking the same line as San Diego Unified itself: embracing the data to help teachers, but not to judge them.

♦♦♦

In City Heights, Edison tracks how students in each classroom perform on school district exams throughout the year. Teachers get together and talk about the results. They can see whether one teacher nudged her students to improve dramatically and another didn’t.

But the scores are just a starting point. While the problem behind bad results could be a weak teacher, it could also be something else. A disruptive child might throw off other students, for example, despite what the teacher does. Edison teachers embrace examining scores as a way to figure out what works and what doesn’t.

Principal Bustani even videotapes classes with especially good results to show other teachers how they taught a lesson. This gentler take on data seems to be working. Scores at Edison have soared.

“They hold themselves accountable. They’ll say, ‘Wow, your kids did so well on this — what did you do?’” Bustani said. She rarely steps in. “I don’t think it’s helpful to say, ‘Your class scored low.’”

Miles away in the San Carlos area, the same philosophy helps Pershing Middle School Principal Sarah Sullivan tell if children from wealthier areas are thriving or just coasting. Sullivan brings teachers together at each grade level to compare how well their children learned different skills.

“Data is a tool. Not an ends,” Sullivan said. “A lot of people think you succeed by focusing on the data. You don’t. You get it by focusing on what’s happening in the classroom.”

Not all schools are doing this. It takes trust to convince teachers to compare the test scores from their classes to those from other classes Though test scores are not officially part of teacher evaluations, teachers fear principals may unfairly blame them for low scores beyond their control. Many principals said they use detailed data to diagnose what skills students need help with, but shy from comparing classrooms because it can unnerve teachers.

So while San Diego Unified has steered clear of using test scores the Obama way, it wants more schools to choose the Edison or Pershing way — by bonding over it.

“If we say, ‘Let’s put all our data on the table and use it to see if we can help students learn, that’s nonthreatening,’” said Richard Barrera, president of the school board. “The minute you say, ‘We’ll use data to evaluate what you’re doing as a teacher,’ the whole conversation becomes defensive. It gets in the way.”

♦♦♦

Even statisticians who relish using data to measure growth caution against jumping to conclusions based on test scores. As the head of the economics department of the University of California, San Diego, Julian Betts regularly uses value-added data to evaluate how different programs work in schools.

Yet Betts warns that tests are riddled with statistical noise. A student can sleep poorly before the test. A dog could be barking outside during it. The same student can do very well on a test one day and badly the next. Add all that noise together and it can lead you astray, especially in one small classroom.

For instance, Betts and another researcher once calculated scores for teachers using three years of tests. When he added a new year of scores, he tossed out the oldest. The results were unsettling: Nearly one-third of the teachers who initially ranked lowest suddenly moved to the top 40 percent.

If a school system had used the results to decide which teachers to fire, it might have already shown them the door. Betts called it “dreadfully close to being completely random.”

Wary of such weaknesses, the National Research Council warned the Obama administration that despite the promise of value-added data, it could not be used to objectively compare teachers — at least not without more study.

Proponents say that while value-added data is imperfect, its flaws are no worse than the existing system. Principals also make mistakes when observing teachers in class, the main way teachers are judged. Teachers get raises based on academic degrees that may or may not make them better.

“There is no perfect measure,” said David Plank, executive director of Policy Analysis for California Education, an independent education research center based in the Bay Area and Los Angeles. Value-added data “shouldn’t be the sole or even the primary factor. But it ought to be in the mix.”

Many teachers argue that the tests are too narrow a measure of schooling in the first place. So in its quest to measure student growth in San Diego Unified, the school board is trying to include other factors besides tests, marking whether students can think critically and creatively, even whether they’re mature and socially skilled.

While the debates rage on elsewhere, teachers sit around the table at Edison, shuffling spreadsheets, trying to figure out what the data are telling them. In this room papered with numbers, it seems impossible that data has become such a battleground in school reform.

Bustani, the principal, says she doesn’t need to threaten teachers to get them to use data and improve. They already want to improve.

But Edison is still an exception in San Diego Unified. The school district believes that more teachers can be coaxed to examine the data about their stumbles and successes on their own, like teachers do at Edison. Principals regularly tour Edison to learn how to do it. But because is it up to schools to start dicing data, many have shied away, fearful that teachers will see it as an attack.

At Lewis Middle School in Allied Gardens, Principal Brad Callahan and his teachers have used test data to tailor lessons and decide which kids need more help after school. Its scores have grown year after year.

Still, Callahan has hesitated from comparing test results in different classrooms at Lewis. He is already at odds with some veteran teachers who want him to evaluate them less often. Gauging growth in different classrooms, he said, would be a touchy step to take, even if it isn’t linked to evaluations.

“It’s sensitive,” Callahan said.