Editing-taolu

Some tips about the paper editing. 一些自己在editing的时候容易犯的小错误。

有的时候换位思考,如果自己作为reviewer或者给别人提供一些guidance的时候,有没有那种confidence就确定这个事情应该是如何而不应该是如何。是不是能讲清楚一些事情,最基本的就是editing时候的一些常见错误了,这里记录一下,希望遇到类似的问题能够达到讲得清楚的效果。

Attitude and patience

The attitude and patience is important, we might need to polish the paper again and again to make it readable. 很容易就有这样的想法,觉得心烦了或者这个paper这个样子交了就算了,这些都是不好的态度。用一些定量的指标来看的话就是在paper的整个框架基本上成型之后,还要从头到尾,完完整整地仔细阅读并且修改至少5次,而且至少要有2次是把paper打印在纸上然后从头到尾地通读一遍,并且标记出来错误和需要修改的地方,然后再统一改正,这算是一遍。这种从头到尾的梳理很关键,刚开始一般仔细梳理一遍都至少需要大半天的时间,然后这种梳理就会变得越来越快,直到比如一两个小时就完成一遍梳理,这个时候论文内容就比较好了。

有时候读自己的paper是需要勇气的,需要克服一些内心的烦躁,因为知道自己写的内容不是太好,有的地方甚至还漏洞百出甚至有很多stupid question。总之在paper的editing上要谦虚一点,多读一遍,多过一遍,静下心来,总是能发现一些额外的错误,把paper润色的更好一些再给别人看也不迟。用通俗的话说就是要在恶心别人之前先恶心自己,这算是对待paper比较负责人的态度吧。同时也要多参考别人的意见,要有个心里准备,不管写的再好,别人也是会提出一些意见的,正常的情况就是自己找问题和修改别人的意见同时进行,这样有个至少3次itertaion之后可能会好一些。不要指望发了邮件给别人之后别人都说好,这是不显示的,心态上要做好迎接别人的批评的准备。

the, a

关于 the a 的使用似乎一直没有摸到关键的法门,总结出来以下一些思路

本身这个词是不是certain的,如果本身是certain的就不用the 或者 a 比如论文中的Table 1, Figure 1 或者Equation 1 或者 section 1,这些都像是名字一样,能直接定位到独一无二的个体也不会引起歧义,就不需要在前面加上冠词。

集合的名字或者是抽象的名词前面也不用加上the 这可能还与受众有关,具体可以参考这个关于share knowledge的解释,如果大家心知肚明的概念或者学科名词或者是事物 The table,The char,The room,The tiger,就像公理一样的,就用the,如果是一个新的概念要不断去具体解释就不用the,算是一个抽象的概念。

这里还介绍了几种不使用article的情况,有些在论文写作中不怎么能用得到,但是也应该注意一下,直接引用过来:

1. to talk about plural and uncountable nouns or when talking about things in general:

I'm terrified of heights
I'm into drum and bass.
I hate cheese.

2. before countries, towns, streets, languages and single mountains:

I'm from China.
I've climbed Mount Everest.
She speaks French.

3. Before some places and with some forms of transport:

I live at home with my parents.
I came here by car.
He goes to work by bus.

4. In exclamations with what + uncountable noun:

What beautiful weather!
What loud music!
What disgusting food!

比较有针对性地要特别注意的地方就是 plural and uncontable nouns. 和当前的research topic比较相关的一个就是 in situ processing。这里的 processing 属于uncountable nouns 而且是general形式,就不需要再前面使用the。

这个资料 说了一些不可数名词和可数名字的复数使用定冠词和不定冠词的情况。可数名词复数在表示一个特定的group的时候也需要用the,如果是general的case就不需要用the。

剩下的情况就是按照用 the 与 用 a 来二分,这个总容易分清楚,如果是前面已经提到的第二次提到,就用the,如果是第一次提到也没有特指的就用a,可以是any of it 当然是单数的情况. 最高级这些就用the,还有The country where I born 这种后面有限定的情况也用the.

singular plural

有一些特殊情况的 singular 和 plural 要稍微注意一下

Third person

Check this one to get more details. When using the modal verbs. Do not use this thirs person sigular.

有一些既可以作为单数也可以作为复数的词汇

比如data,比如表示传递数据的时候一般作为复数比较make sense. 最好整个文章保持统一,或者为了避免歧义就用data blocks来描述也比较好。

A and B 并列的时候,单复数的使用

http://www.gaosan.com/gaokao/332134.html

需要从语义的角度来看,看看所说的事情是不是一个概念。

有一些特殊词,比如data,似乎当做singular或者plural都是可以的,这种时候最好全文统一起来。一般情况下指data blocks的时候,还是当成plural的形式比较好。

关于公式后面的标点符号

https://blog.csdn.net/weixin_46309254/article/details/122541026

注意公式也是段落的一部分,要合理地加上句号或者逗号。除了as follows要加上冒号引出公式外,其他的一般expressed as,described as 这些就不需要加上冒号。

Tense

关于时态就按照如下思路

默认都是一般现在时
在related work中描述别人已经完成的工作用过去时态
然后在最后的summary一段使用过去时或者完成时。

更多的例子可以参考reference中的 天文物理类英文科技论文写作的常见问题 一文

Logic

对于自己而言常容易弄混的是总分和因果的logic。Typical的总分就是section开始的时候用处比较多,比如这个section介绍design,之后 in particular, we discuss sub case 1, sub case 2 and sub case 3. These subcase describe a particular aspects of the design. 因果的logic 就是有明显的conclusion和reason,比如这个现象是什么,为什么是这样,然后得出什么结论。

一些误用的地方就是容易过早地给出conclusion,造成不够convincing的表述。这里要特别注意,对于描述为什么这样做而不是那样做的时候,要做好铺垫。即使是类似 “we can continue discuss how a is influenced by b” 这样的话也是有效的铺垫,尽量避免直接表述 “a is larger than b” 这样的结论。就算和别人说话时候也是,要找点那种娓娓道来的感觉。

Shrink the paper

paper似乎就像是一个含着水的海绵,只要想办法挤,总是能使它越来越dry并且越来越tight. 常用的方式有以下一些:

\setlength{\textfloatsep}{0pt} 这个会减少一些图片与文字之间的距离,要是还有的距离比较大,就使用\vspace{-10pt} 一般是图片的caption部分和图片之间,总有一些可以缩小的空隙。

然后看看段落的末尾,是否有那种dangling sentences或者是dangling words, 这样的词改掉之后可以得到一些额外的空隙。

再一个就是看related work,有些没必要用的就可以省去,有些references会夹带一些没用的内容,比如像notes什么的,这些也可以删去。

通常这各种操作时候,就基本上能够满足页面的要求了,一般也就是超过个半页的样子。如果还是没有满足,就说明实在是有一些冗余的内容,这个时候就需要在review一下整个文章,合并一下段落,然后对于一些可说可不说的信息就直接省去就好。

Indent

一般thesis的edit常常在title或者subsection的第一行不采用indent,这就比较奇怪,有的template是默认每段开始都使用indent有的是首段不用。看到这个讨论之后有一些恍然大悟的感觉。indent的目的就是为了在段落和段落之间进行区分,既然第一段直接紧连着标题,那就不用做什么区分了。而且这样看起来也整洁一些。

Remove the “there be” or “it is”

这样的句型一方面是冗余,一方面是指代不清楚(有的时候自己都不确定是指代的什么,更容易给读者造成confusion),在论文写作中,最好强制地让自己少用一些代词,比如this之类的,这样句式上可以更加简洁一些,要特别有那种主语的意识,尤其是there be 句型,似乎是最先容易想到的,这个时候就再仔细想想相关的动词和主语到底是什么,看看是否能把there be给替换掉。

Here are several examples:

it is necessary to use A to do sth -> using A to do sth is necessary

when there is a fixed number of processes -> when the number of processes is fixed

when there are arbitrary combinations of A and B -> When the combination of A and B is arbitrary.

Online vs Real time

很多时候自己把这两个词等同起来用了,实际上强调的是不同方面的事情,这个answer解答的比较好。

online是一种更broad的修饰,最简单的理解就是打电话,对方正在接听或者正在线上就是online。所以一般说的online service就是这个服务一直有respond,比如发了一个request,然后得到了respond,这个就是online service,如果没有respond,这个就是offline service,这里的online更是强调一个interactive的过程。

real time 更强调的是data processing 的 delay。这个文章解释的比较好。这些相关的term更多的是强调不同的delay, 一般比较常见的用法就是real time processing 这种。

所以在使用real time或者online的时候要特别注意context,看看是说一个service是否有交互还是在强调data processing的pipeline。这两个词实际上是在强调事情的不同方面。

Another scenario is the model learning or parameter prediction, we use the online prediction or offline prediction to emphasize the approach, refer to this to get more ideas

https://www.mathworks.com/help/ident/ug/what-is-online-estimation.html

and this

https://www.mathworks.com/help/ident/ug/how-online-estimation-differs-from-offline-estimation.html

Tools

Grammaly professional version can reduce most of the typos or the general expression.

https://ludwig.guru/ Can help to find how a particular pharase is adopted in other related articles.

If you do not sure how to translate a particular meaning in an accurate way, just use the the google translate. Or if you not sure if a particular english paragraph can express things in an accurate manner, just use google translate to translate the english version into the native language to see if everything is correct. Sometimes, I translate english into the Chinese by google translate, and find that the meaning is not what I want to expressed.

Other concrete examples and tips

logic flow

There is one paper to discuss the elasticity, we did not provide enough information at the beginning to discuss details of the elasticity, the reviewer provides a good logic flow to reorganize it.

Write one sentence explaining what elasticity is, then one sentence with the fact that it was identified as a key research challenge, and finally one sentence to explain what the present work focuses on.

Originally, we start with the sentence such as our works focuse on A, B, C and then use several sentences to explain the A B C in detail.

When listing the structure of the framework, we tend to use the overarching method and then the specific point. But when we try to argue sth, do not hurry to give out a specific conclusion quickly and directly. We need to get to that point step by step. Explain the reason clearly and then move to the results, instead of explaining that in a inverse way, other people might not catch the main point in this way.

the position of adv

Try to put the adv after the verb as much as possible. 习惯上adv是放在verb的后面的,尤其是only further之类的,这个和中文的顺序有些相反。

fianlly vs at last

在描述一个section的时候常常使用first, second, third类似的表述,之前自己认为at last 与 finally 是可以互相替换的,后来被别人指出来,这个at last 更多的是有一些negative的语义在里面(有一种discomfort的情绪在里面,比如after waiting for several hours, we buy the ticket at last),所以使用finally在比较正式的论文中更合适一些。Finally与lastely是可以互相替换的,这个文章中说的比较细致。

from vs in

There is a expression such like “Z is a PhD student from CS department”, 之后被老师改成了 “Z is a PhD student in CS department”. 这里到底是in还是from自己之前也弄的比较模糊,这个文章 列了几个常见的from的用法,后面加表示具体地方的名词才是那种来自哪里的表述。一般都是一个地名,而CS department是机构的名称,这里使用in表示自己在这个机构里面,这样更合适一些。

其他的一些容易混淆的地方,比如on in at的区别,这个视频说的比较好。主要是粒度上的一些问题,还有一些管用的地方。

number of process vs process number

The process number is more souds like the process rank, if we want to express how many process in total, maybe the number of process is a more suitable word

left side figure

It is good to use figure on the left or figure on the right. Or just use the left figure or right figure.

others

一些比较好的替换的词,比如main research area is 可以润色成 primary research area is. 比如is suppoed to do sth 可以润色成 is expected to do sth 这个expected to do的口气更柔和一些,希望去做成什么事情,但是这个supposed to do某种程度上有一些should do sth的口气在里面,表示应该要去完成什么的样的事情,应该要如何如何。

show things make sense

It is common that one paper can have multi rounds iteration, when show other people about the wip work, remember to show them things that make sense, if the contents come from old data or sth that does not make sense, just comment the contents out and do not show it. This can save you a lot of energy for explaining things.

Terminology of insitu

在正式出版的书中 in situ 这里是不加hyphen的。而且相关的这个词出现的地方也用斜体进行了标注。类似的还有in prior。这些在实际的editing中经常用到,要特别留意一下。而且网上的文章里面把in-situ误写成in situ的例子还是比较多的,这里要特别留意一下。

Online vs Offline

For the parameter estimation of the model, there are online approach and the offline approach. This article explains a lot of details. From the paper’s perspective, do not too persist into the online method. Since from the research perspective, the accuracy of the model is more important and have more research value. You could always try to update the model and use it into an online scenarios which can recursively update the model parameters.

If there are some certainties of the system or the source data, you can always start with the offline model estimation approach (which is easier for most cases). Just collect a bounch of input data and the output results, and then train the model based on these data. That’s how it works in general.

The online approach might be only suitable for the reinforcement learning scenarios or the case that you can not store the large amounut of data.

This is also a good insight for the in-situ or post-processing approach. In most of the cases, the real-time manner (in-situ processing) is not the necessary one. It is just a way to do the things, but it is not the necessary one. So if you focuse too much on this specific way to do things, you may lose some key points. For instance of ML, the key point is develop the model that works as expected instead of doing the parameter in an online or offline manner.

Reply to reviewer’s feedback

The reviewer prefers to reply the feedback in a point by point manner. By this way, they can make sure the issue provided by them are fixed properly. Instead of using your logic to organize the revision letter and then add a (to R1 to R2), it is better to list clearly, this modification is targeted on which issue provided by which reviewer.

Some tools to do the revision

The grammaly can help us to check the naive typos.

The https://ludwig.guru/ can help us to check how the specific expression are used in all kinds of online documents, if you are not sure about specifc phrase, it is always good to check it here.

Another important thing is the google translate, try to let the google tranlator to tranlate the english paragraph into your native language, if everything looks correct and it is what you want express, your paragraph looks good. This is a really helpful tool.

Important principles

I took the writting lectures whole day from Professor George Gopen, it gave me different feelings about writting scientific articles. Taking down some important keyword and these key ideas should be used as a habbits to revise the code. Just like reviewing the merge request, when you read or write the article with these ideas in mind, the reading or writing process can become really different (Trying to cultivate this habbit during reviewing articles). These key word includeing:

“reader’s expectation (convention)”,

“topic sentences (context)”,

“stress position”,

“avoid backword link”,

“subject and verb need to be placed next to each others” (I have a bad writting habits sometimes, I like to write sentences such as: “For sth, it is … “ This is redoundant form, just jumping to the subject verb form direactly and writting can become more compact). By forcing your self consiering how to put subject and verb together, we may decrease the abuse for some verb such as “have”, “is” and the “there be” form.

“flow natually” etc.

One good article that summarizes these whole ideas can be founud here (this one is really good, expecially about the topic posision and stress position, five star).

Some really good point for my case is that, I tend to put the conclusion at the first and then explain the reason and say this is because… Which is not good habits, since it breaks the rule that we start with the context or topic and then put important thing at last. At least we should start from general to details.

There are all kinds of avalible videos from Professor George Gopen online. I will not go through details. Just trying to list several examples here and see how these concepts and principles are adopted here.

Exapmle 1:

It may need pay attention to know the different between topic sentences and stress sentences in scientific writting. For exmaple, “we can explain the results by two ways” this is topic sentences which is good to start at the paragraph “The A is less possible than B, because …”, this is the stress point and should be put at the end of the paragraph as the conclusion.

TOOD, list the example in the lecture tutorial

Reference

This is a really good one for Chinese students

https://arxiv.org/pdf/1011.5973.pdf

Some writting style references

https://www.computer.org/publications/author-resources/peer-review/magazines#writing

推荐文章